General
The programming environment in Louhi is equipped with several numerical subroutine libraries: LibSci (Cray XT Series LibSci), ACML (AMD Core Math Library), FFTW (Fast Fourier Transform library), and PETSc (library for sparse solvers). The Libsci and FFTW modules are loaded as part of the PrgEnv-environments.
LibSci
LibSci is the numerical subroutine library for parallel computing. Originally, LibSci stands for Cray Scientific Library. The current Libsci is tuned for XT machines and is called Cray XT Scientific library, XT-LibSci.
The LibSci library includes:
- BLAS: Basic Linear Algebra Subroutines
- LAPACK: Linear Algebra routines
- BLACS: Basic Linear Algebra Communication Subprograms
- ScaLAPACK: a parallel version of LAPACK
- IRT: Iterative Refinement Toolkit
- SuperLU for solving large sparse linear systems
BLACS and ScaLAPACK can be used in parallel programs based on MPI or SHMEM.
SuperLU in Cray XT Series supports only the distributed-memory version of SuperLU.
Linear algebra
BLAS and LAPACK are standard libraries for linear algebra operations. For documentation, see http://www.netlib.org/blas and http://www.netlib.org/lapack.
BLAS operations are divided into three classes:
- Level 1: operations on vectors (one or more)
- Level 2: operations involving a vector and a matrix
- Level 3: operations between two matrices
BLAS operations are used as building blocks for higher level operations in other libraries such as LAPACK.
LAPACK, originally written in FORTRAN 77, is a very widely used library for solving problems in (dense) linear algebra. In addition to solving linear equations, the subroutines handle least squares problems, eigenvalue problems and singular value problems.
More information about LibSci can be found in Cray Application Developer's Environment User's Guide, see http://docs.cray.com.
There are also general man pages for LibSci: intro_libsci(3s), intro_blas1(3s), intro_blas2(3s), intro_blas3(3s), intro_irt(3). Some of these can be displayed without the leading intro_, e.g., man blas1, man libsci. References to additonal manual pages can be found from these general manual pages. There are also manual pages for single routines.
Usage from C
If you require a C interface to BLAS and LAPACK but want to use Cray XT-LibSci BLAS or LAPACK routines, you must use the Fortran interfaces, which can be accessed from a C program by adding an underscore to the end of respective routine names and by passing arguments by reference (rather than by value in the traditional way). C programmers using the Fortran interface are advised that arrays are required to be ordered in the Fortran column-major manner. In the linking phase you must use a Fortran compiler as linker. For PGI you must add -Mnomain option in linking phase:
cc -c libsci_blas_prog.c
ftn -Mnomain -o libsci_blas_prog libsci_blas_prog.o
For PathScale this option is not needed:
cc -c libsci_blas_prog.c
ftn -o libsci_blas_prog libsci_blas_prog.o
FFTW
FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e., the discrete cosine/sine transforms or DCT/DST). On Louhi, there are two versions of FFTW: 2 and 3. These versions are incompatible with each other. MPI parallel transforms are only available in 2.1.5. For detailed instructions on the use of FFTW, see http://www.fftw.org . See also Cray Application Developer's Environment User's Guide in http://docs.cray.com.
You must load the fftw version you need by the module load command. The command module avail fftw shows which versions are available. FFTW 3.x.y and FFTW 2.1.5 cannot be loaded at the same time. For more information on the use of versions 3 see man intro_fftw3. You don't need to give the flag -l, but the correct libraries are linked automatically.
Please note, that the Cray XT versions fftw/3.2.1 and fftw/3.2.2 do not work with the GNU libtool (see info libtool or libtool --help) in default static linking on Cray XT. Use the version fftw/3.2.2.1 patched for Cray XT or higher, instead, if your application needs libtool when linking FFTW 3.2.1 or 3.2.2 libraries statically.
The FFTW version can be changed, e.g., to 2.1.5.1 (the patched 2.1.5 for Cray XT) with the command like this one
module swap fftw/3.2.2.1 fftw/2.1.5.1
For more information on the use of versions 2.1.x see man intro_fftw2. It is necessary to specify the libraries linked, e.g, for double precision routines f77 -o test test.F -ldfftw -ldrfftw .
Both FFTW libraries can be linked also to Fortran programs, as the example above shows.
ACML
The ACML routines can be called from both Fortran and C programs. ACML comprises the following parts:
- BLAS : Basic Linear Algebra Subprograms
- LAPACK : Linear Algebra Package for solving linear equations and eigenvalue problems
- FFT : a set of routines for Fast Fourier Transform
- RNG : a set of Random Number Generators and statistical distribution functions
- Fast vectorized versions of standard mathematical funcions
For a comprehensive documentation of ACML, see http://developer.amd.com/assets/acml_userguide.pdf
This short introduction follows very closely the manual.
Usage
The ACML module can be loaded with the command:module load acml
which loads the default version 4.2.0. By giving the version number you can load an older or newer (if availabale, see module avail acml) version.
Most of the available modules are for serial version of ACML. For the most of the OpenMP (the suffix _mp in directory and libarary names) and and INTEGER*8 (suffix _int64 in directory names) versions you must yourshelf define needed environment variables, make your own modules or use -I, -L and -l options in compilation an linking.
When linking in ACML routines with PGI compilers, you must compile and link all program units with -Mcache_align or an aggregate option such as -fastsse, which incorporates -Mcache_align. For most of the versions, for which there are loadable modules, no other compiler options are needed to link the library routines to your program when you are using the PGI compilers. The compiler should automatically look for the routines. PathScale compilers do not need any extra options for linking ACML routines. With GCC compilers there may be various difficulties depending on the compiler command and you may need to add various options.
Please, note that the PGI compiler versions 7.1 and 7.2 are not compatible with the ACML versions 4.0 and 4.1. Use the PGI versions 8.0 (the default is now 8.0.6) and the ACML versions 4.2, instead.
Please, note also that the ACML version 4.3.0 does not support PathScale compilers and the PathScale ACML libraries are missing. Instead, support for the Open64 compilers (their libraries) has been added.
In addition the version 4.0.1 ACML serial version is not compatible at all with version 4.2 GCC compilers, especially with gfortran, because it is built with GCC V. 4.1. Therefore you must use version 4.1 GCC compilers with the serial version 4.0.1. The OpenMP version 4.0.1 (the suffix _mp) can be used with version 4.2 GCC compilers, because it is built with that version. However, use the module acml/4.0.1a instead of acml/4.0.1 with 4.1 and 4.2 gfortran, because acml/4.0.1 is missing a subdirectory necessary for GCC compilers. Well, strictly speaking that subdirectory, gfortran64, is not missing, but Cray compiler wrappers use wrong name, gnu64, for that subdirectory, and the link gnu64 -> gfortran64 is missing from module acml/4.01 installation. It is added to module acml/4.0.1a installation.
The actual algorithms behind some LAPACK subroutines in ACML differ from those used in the LAPACK source in public domain. Both functionally and numerically the subroutines conform to the usual LAPACK conventions.
Fast Fourier Transforms
Discrete Fourier Transforms in ACML come in two types:
- The transforms in the first type map complex data to complex data. These routines have names beginning with ZFFT (double precision) or CFFT (single precision). There are separate routines for 1D, 2D and 3D transforms. Applying forward and backward transforms consecutively recovers the original data.
- Transforms of the second type map complex data to real data or vice versa. The names begin with DZFFT or SCFFT (complex to real) and ZDFFT or CSFFT (real to complex). These routines are available only for 1D sequences, and consecutive forward and backward transforms will NOT recover the original data; rather, the transform must be conjugated before the backward transform to recover data.
routines do not cover all possible transforms. For more information, see man intro_fft.
Random number generators
ACML has five different Base Random Number Generators (BRNG) for producing sequences of pseudo-random numbers unifomly distributed over the open interval (0,1). In addition, there are 23 distribution generators for transforming the uniformly distributed numbers to variates from specified distributions (for example, the normal distribution or the chi squared distribution).
Istanbul processor support
In the ACML Version 4.3.0 the level 1 BLAS routines have been tuned for AMD Istanbul processors. Routines affected include xDOT, xCOPY, xAXPY, and xSCAL routines.
PETSc
PETSc is the Portable, Extensible Toolkit for Scientific Computation (PETSc) library. PETSc is an open source library of sparse solvers. There are two PETSc modules:
-
petsc for real data
-
petsc-complex for complex data
The present release is 2.3.3. You must load one of the PETSc modules before you can use the library:
module load petsc
You may try compilation and linking of C programs first by the usual command:
cc -o petsc_prog petsc_prog.c
However, if one or more of the PETSc routines used in your program calls, for example, BLAS routines, which are Fortran routines, you must use Fortran compiler for linking also for C programs. For PGI this means:
cc -c petsc_prog.c
ftn -Mnomain -o petsc_prog petsc_prog.o
or compilation and linking in one phase directly:
ftn -Mnomain -o petsc_prog petsc_prog.c
The meaning of the option -Mnomain is explained in the man page pgf90(1). For PathScale this option is not needed (there is not such option). At present, we do not know, how linking must be done with GCC compilers.
For details, see and http://www-unix.mcs.anl.gov/petsc/petsc-as/index.html, Cray Application Developer's Environment User's Guide in http://docs.cray.com and man intro_petsc (when one of the PETSC modules is loaded).