Louhi User's Guide, the 2nd Edition > Program development > Compilers > PathScale Compiler Suite
Tehdyt toimenpiteet

PathScale Compiler Suite

This subsection describes the usage of PathScale compilers: Fortran 90/95, Fortran 77, C and C++.

NBS! On the contrary what was announced earlier the usage of PathScale Compiler Suite on Louhi continues also after the of the end of August, 2009.

The present default PathScale version is 3.2 on Louhi. In order to use PathScale Fortran, C and C++ compilers, you must have the programming environment PrgEnv-pathscale loaded. You can swap to this module from PrgEnv-pgi which must be unloaded before you can use PathScale compilers. Please note that while the man pages pathf90, pathcc etc exist, their main purpose is to document how the compiler command is called. The complete man page, which explains all the options for PathScale compiler suite is eko

For compiling of CNL applications the Cray compiler wrappers ftn, cc and CC must be used. The wrapper f77 is not supported in PathScale, but ftn may be used instead of it for compiling FORTRAN 77 programs.

File suffices

The same file suffices e.g., .f, .F.c and .C, can be used as for the PGI compilers (see section Compilers) with the same meanings.

Compiling programs

Programs are compiled in the same manner as usually, but there is a bug in the default library linking options, which affects some programs. If you cannot link your program and receive an error message about "...multiple definition of something..." or "Warning: size of symbol something changed from...", there is a workaround. Add the following flags to the linker command line: "-Wl,-z,muldefs". Like the following
cc -Ofast -Wl,-z,muldefs -o program program.c

Compiler Options

The Cray wrappers accept most (if not all) of the command line options of the underlying compiler. The following table lists some useful PathScale compiler options.
Option
Description
-march=barcelona AMD Barcelona processor in 64-bit mode. Added automatically by the module xtpe-barcelona.
-O{0|1|2|3} Optimization level, greater number means more optimization. Please use at least -O2 in production runs. Level 3 may alter the semantics of the program, but usually this does not cause problems in correctly written code. Mostly it just reduces precision by a digit or two and stops signaling over- and underflows. Please use the command man eko for better explanation.
-Ofast Shortcut option which provides a basic set of optimization flags; equivalent to -O3 -ipa -OPT:Ofast -fno-math-errno -ffast-math.
-ipa|-IPA Turns on interprocedural analysis. This means that the bulk of optimization is done in the linking phase instead of compilation as usual. This has the advantage that the optimizer sees the whole program, including possible library routines and can analyze the program flow and improve the optimization. Also, IPA allows inlining of routines from different compilation units, something not normally done without IPA. The eko man page describes the suboptions to -IPA:<suboptions>.
-OPT:Ofast Shortcut option which provides basic set of optimization parameters. Use of this option needs at least -O2. There are many more suboptions to -OPT: and they are documented in the eko man page.
-fno-math-errno Modifies math functions that are executed with a single instruction, e.g. sqrt so that they do not set errno. A pro­gram that relies on IEEE exceptions for math error handling may want to use this flag for speed while main­taining IEEE arithmetic compatibility.
-ffast-math Improves floating point speed by relaxing ANSI & IEEE rules. -fno-fast-math tells the compiler to conform to ANSI and IEEE math rules at the expense of speed. -ffast-math implies -OPT:IEEE_arithmetic=2 -fno-math-errno. -fno-fast-math implies -OPT:IEEE_arithmetic=1 -fmath-errno.
-LNO:<suboptions> LNO is the loop nest optimizer. It is only enabled at -O3 and higher. Some of its suboptions are documented below, the rest can be found in the eko man page. The LNO contains the most powerful optimization algorithms of the PathScale compiler.
-LNO:fusion={0|1|2} Perform loop fusion: 0 = Loop fusion is off, 1 = Perform conservative loop fusion (this is the default), 2 = Perform aggressive loop fusion. This option is perhaps the most beneficial to most programs, especially those that have loops within loops (loop nests). It is worth at least a try for all programs.
-LNO:fission={0|1|2} Perform loop fission. As the name suggests, this is the opposite of loop fusion. The levels are: 0 = Loop fission is off (default), 1 = Perform normal fission as necessary, 2 = Specify that fission be tried before fusion. Please note that since fusion and fission can potentially annihilate each other's effect, using them simultaneously is usually not beneficial.
-LNO:simd={0|1|2} This flag controls inner loop vectorization which makes use of SIMD (single instruction, multiple ddata) instructions provided by the native pro­cessor. The levels are: 0 = Turn off the vectorizer (you should never have a reason to do this), 1 = Vectorize only if the compiler can determine that there is no undesirable performance impact due to sub-optimal alignment. Vectorize only if vectorization does not introduce accuracy problems with floating-point operations (default) 2 = Vectorize without any constraints (most aggressive).
-LNO:vintr={0|1|2} This flag controls loop vectorization to make use of vector intrinsic routines. This is completely different from SIMD. A vector rou­tine replaces such functions as sin, cos, exp, log etc. A vector routine is called once to compute a math intrinsic for an entire vector (or, in some cases, array). -LNO:vintr=1 is the default. -LNO:vintr=0 turns off the vintr optimization. Under -LNO:vintr=2 the compiler will do aggressive optimiza­tion for all vector intrinsic routines. Note that -LNO:vintr=2 could be unsafe in that some of these rou­tines could have accuracy problems.
This option is always worth trying if your code uses any functions from the C/C++ math (you link with -lm) library or Fortran intrinsic math functions. A typical example would be computing the sin of a large vector or array. The usual cosine takes 92 CPU cycles, while the vector version can take as little as 43 cycles per value, so if you have lots of math functions, try vectorizing them. Furthermore, the accuracy problems are usually encountered only for extremely large or small absolute values (i.e. very close to maximum or minimum representable values) of the arguments.
-LNO:vintr_verbose={0|1} Default is 0; if set to 1, output information on vectorization of math functions.
-LNO:simd_verbose={0|1} Default is 0; if set to 1, output information on SIMD vectorization.
-C (For Fortran) Perform runtime subscript range checking. Subscripts that are out of range cause fatal runtime errors. If you set the F90_BOUNDS_CHECK_ABORT environment variable to YES, the program aborts.
(For C/C++) Keep comments after preprocessing.
-Wl,-z,muldefs Works around a library bug, which affects linking of some programs. Note that -Wl,-z,muldefs is the format given to cc and ftn; if you call the linker directly, the option is simply "-zmuldefs".
N.B. The -LNO, -IPA and -LNO parameters can be combined like this: -LNO:vintr=2:fusion=2:simd=2.
 
For the quad-core AMD Opteron architecture there is a new option -march=barcelona for quad-core Opteron Barcelona processors, which is added automatically. It is added by loading the module xtpe-barcelona, which should have been loaded automatically when you log in Louhi. This option appeared first time in Version 3.1 of the PathScale compiler suite. When this option is on use of SSE3 and new SSE4A instructions in addition to normal SSE and SSE2 instructions are enabled. For the quad-core Shanghai or six core Istanbul AMD processors there is not yet any explicit options in PatScale, but the option -march=barcelona is used also for them.