Louhi User's Guide, the 2nd Edition > Program development > Program development tools > CrayPat Performance Analysis Tool
Tehdyt toimenpiteet

CrayPat Performance Analysis Tool

In this section the CrayPat performance analysis tool is described.

The CrayPat tool provides detailed information about application performance. It can be used for basic profiling, MPI tracing and hardware performance counter based analysis.

The CrayPat is a suite of utilities, of which the most important are pat_build and pat_report.

Usage

To use CrayPat the user must first load the appropriate module:
module load xt-craypat

The log files produced by the CrayPat must be written to the Lustre file system (under $WRKDIR). You might need to set the environment variable PAT_RT_EXPFILE_DIR to a suitable directory, e.g.,

setenv PAT_RT_EXPFILE_DIR /mylustre/directory
Note that if the program is run as a batch job, this environment variable must be set in the batch script.

Next the program has to be recompiled or at least relinked. On the other hand, Craypat require that the .o files used to create an executable be present and optionally, the .a files, if created, and in some cases require that the original source files be available as well.
Here is an example, separate compile and link steps
ftn -c my_sourcefile.f90
ftn -o my_executable my_sourcefile.o

pat_build


The executable is instrumented with pat_build. The experiment type is chosen with a flag. Use the -u option to trace  all user-defined functions in your program.
pat_build -u my_program

The option -g enables you to instrument all function entry point references belonging to a specified tracing group.

pat_build -g group my_program
where group is chosen from the table below (choose one or more groups):

Tracegroup
Description
heap Dynamic heap information
io
stdio and sysio calls
math ANSI math library calls
mpi
MPI calls
shmem
SHMEM calls
stdio
I/O with buffered I/O construct
sysio System I/O calls
system System calls
omp
OpenMP calls
pthreads
POSIX threads

For example, to trace MPI calls and all user-defined functions in your program instrument with -g mpi  -u :
pat_build -g mpi -u my_program

The name of the instrumented version of the executable ends with +pat, so in the example the result is my_program+pat.

Running the instrumented code produces binary log files in the directory specified by the environment variable PAT_RT_EXPFILE_DIR, e.g., my_program+pat+2072tdo-0001.xf, where a process ID number, an experiment identifier and MPI task id are concatenated to the file name.

pat_report

Text reports from the log file are created with pat_report. Note that running pat_report requires loading the CrayPat module. A summary from all the logfiles can be generated with

pat_report log_data_directory

For single task, use the individual file, for example:

pat_report my_program+pat+2072tdo-0001.xf

Without options pat_report produces a default report. The user can specify the contents and the layout of the reports in great detail by using appropriate options. There is, however, a group of standard reports that can be selected with the option -O, e.g.,

pat_report -O mpi log_data_directory

which highlights the MPI performance data (if it was specified when the executable was instrumented with pat_build).

Some keywords for the -O option are listed in the table below:

Keyword
Description
profile Subroutine level data
callers Function callers
calltree Calltree
heap Heap information, instrument with -g heap
mpi MPI statistics, instrument with -g mpi
load_balance Load balance information

There are various options of pat_report, with which very detailed information of the run can be produced. For details, see Sections 2.1 and 2.4 of http://docs.cray.com/books/S-2376-41/S-2376-41.pdf


CrayPat and hardware performance counters

The CrayPat tool provides also hardware counter data. For example, the floating point performance or the cache behavior of an application can be investigated in this way.

The hardware performance counter experiment is defined by setting the environment variable PAT_RT_HWPC. For instance, to gather floating point performance data select counter group 5 with

setenv PAT_RT_HWPC 5

before running the code. The available counter groups are the following:

Counter group
Description
 0 Summary with instruction metrics
 1  Summary with translation lookaside buffer metrics
 2  L1 and L2 cache metrics
 3  Bandwith information
 4  Hypertransport information, DO NOT USE, no quad-core support
 5  Floating point instructions
 6  Cycles stalled and resources empty
 7  Cycles stalled and resources full
 8  Instructions and branches
 9  Instruction cache values
10
Cache hierarchy
11
Floating point instructions 2
12
Floating point instructions (vectorization)
13
Floating point instructions (single precision)
14
Floating point instructions (double precision)
15
L3 cache
16
L3 cache, core-level reads
17
L3 cache, core-level misses
18
L3 cache, core-level fills caused by L2 evictions
19
Prefetches
More detailed information can be read from online man page, command: man hwpc


Individual counters can also be accessed. See Appendix A of http://docs.cray.com/books/S-2376-41/S-2376-41.pdf for more details.

It is also possible to focus on a certain section of a code (a loop nest, for example) by using the CrayPat hwpc library. In this approach the user must insert library calls to the code to start and stop collecting counter data. For details, see Section 4 of http://docs.cray.com/books/S-2376-41/S-2376-41.pdf .