OpenMP is an implementation of multithreading a method of parallelization whereby the master thread forks a specified number of concurrently running slave threads and a task is divided among them. In addition to library routines, OpenMP provides Fortran directives, C and C++ pragmas, and environment variables.
With threading enabled by Cray Linux Environment, Louhi now supports version 2.5 of the OpenMP API. The PGI, PathScale, and GNU compilers support OpenMP.
To use OpenMP, you need to include the appropriate OpenMP option on the compiler command line. The compiler command options are:
You also need to set the OMP_NUM_THREADS environment variable to the number of threads in the team.
The number of processors hosting OpenMP threads at any given time is fixed at program startup and specified by the aprun -d option (see Job launching command: aprun for further information).
OpenMP applications can be used in hybrid OpenMP/MPI applications but the OpenMP threads must see the same memory bank, i.e. may not cross node boundaries. In OpenMP/MPI applications, MPI calls can be made from master or sequential regions but not from OMP parallel regions.
The usage of OpenMP in Louhi is exemplified here by a simple MPI/OpenMP program omp.c:
#include <mpi.h>The program is compiled a linked as
int main(int argc, char *argv)
int rank, nid, thread;
#pragma omp parallel private(thread)
thread = omp_get_thread_num();
#pragma omp barrier
printf("Hello from rank %d (thread %d) on nid%05d",
rank, thread, nid);
if (thread == 0)
printf(" <-- master\n");
printf(" <-- subordinate\n");
cc -mp=nonuma omp.c -o ompTo run the program interactively, set
setenv OMP_NUM_THREADS 4
before launching e.g. a two-node job in XT4 nodes or one-node job in XT5 nodes job (8 cores in both cases), depending on which type node aprun selects or you select for it with the option -L in an interactive job, with
aprun -n 2 -d 4 ./omp
In the XT5 nodes, also 8 thread jobs within one node are possible;
setenv OMP_NUM_THREADS 8
aprun -n 2 -d 8 ./omp
In fact, you can use any number of threads, but if it is more than 4 in XT4 nodes or more than 8 in XT5 nodes, performance may reduce, because then more than one thread is running in one or more cores. The optimal number of threads is almost always equal to the number of employed cores. See also Chapter Parallel batch jobs for examples of batch job scripts for pure OpenMP and mixed OpenMP/MPI jobs.