The CrayPat tool provides detailed information about application performance. It can be used for basic profiling, MPI tracing and hardware performance counter based analysis.
The CrayPat is a suite of utilities, of which the most important are pat_build and pat_report.
Usage
To use CrayPat the user must first load the appropriate module:module load xt-craypat
The log files produced by the CrayPat must be written to the Lustre file system (under $WRKDIR). You might need to set the environment variable PAT_RT_EXPFILE_DIR to a suitable directory, e.g.,
setenv PAT_RT_EXPFILE_DIR /mylustre/directoryNote that if the program is run as a batch job, this environment variable must be set in the batch script.
Next the program has to be recompiled or at least relinked. On the other hand, Craypat require that the .o files used to create an executable be present and optionally, the .a files, if created, and in some cases require that the original source files be available as well.
Here is an example, separate compile and link steps
ftn -c my_sourcefile.f90
ftn -o my_executable my_sourcefile.o
pat_build
The executable is instrumented with pat_build. The experiment type is chosen with a flag. Use the -u option to trace all user-defined functions in your program.
pat_build -u my_program
The option -g enables you to instrument all function entry point references belonging to a specified tracing group.
pat_build -g group my_programwhere group is chosen from the table below (choose one or more groups):
| Tracegroup |
Description |
|---|---|
| heap | Dynamic heap information |
| io |
stdio and sysio calls |
| math | ANSI math library calls |
| mpi |
MPI calls |
| shmem |
SHMEM calls |
| stdio |
I/O with buffered I/O construct |
| sysio | System I/O calls |
| system | System calls |
| omp |
OpenMP calls |
| pthreads |
POSIX threads |
For example, to trace MPI calls and all user-defined functions in your program instrument with -g mpi -u :
pat_build -g mpi -u my_program
The name of the instrumented version of the executable ends with +pat, so in the example the result is my_program+pat.
Running the instrumented code produces binary log files in the directory specified by the environment variable PAT_RT_EXPFILE_DIR, e.g., my_program+pat+2072tdo-0001.xf, where a process ID number, an experiment identifier and MPI task id are concatenated to the file name.
pat_report
Text reports from the log file are created with pat_report. Note that running pat_report requires loading the CrayPat module. A summary from all the logfiles can be generated with
pat_report log_data_directory
For single task, use the individual file, for example:
pat_report my_program+pat+2072tdo-0001.xf
Without options pat_report produces a default report. The user can specify the contents and the layout of the reports in great detail by using appropriate options. There is, however, a group of standard reports that can be selected with the option -O, e.g.,
pat_report -O mpi log_data_directory
which highlights the MPI performance data (if it was specified when the executable was instrumented with pat_build).
Some keywords for the -O option are listed in the table below:
| Keyword |
Description |
|---|---|
| profile | Subroutine level data |
| callers | Function callers |
| calltree | Calltree |
| heap | Heap information, instrument with -g heap |
| mpi | MPI statistics, instrument with -g mpi |
| load_balance | Load balance information |
There are various options of pat_report, with which very detailed information of the run can be produced. For details, see Sections 2.1 and 2.4 of http://docs.cray.com/books/S-2376-41/S-2376-41.pdf
CrayPat and hardware performance counters
The CrayPat tool provides also hardware counter data. For example, the floating point performance or the cache behavior of an application can be investigated in this way.
The hardware performance counter experiment is defined by setting the environment variable PAT_RT_HWPC. For instance, to gather floating point performance data select counter group 5 with
setenv PAT_RT_HWPC 5
before running the code. The available counter groups are the following:
| Counter group |
Description |
|---|---|
| 0 | Summary with instruction metrics |
| 1 | Summary with translation lookaside buffer metrics |
| 2 | L1 and L2 cache metrics |
| 3 | Bandwith information |
| 4 | Hypertransport information, DO NOT USE, no quad-core support |
| 5 | Floating point instructions |
| 6 | Cycles stalled and resources empty |
| 7 | Cycles stalled and resources full |
| 8 | Instructions and branches |
| 9 | Instruction cache values |
| 10 |
Cache hierarchy |
| 11 |
Floating point instructions 2 |
| 12 |
Floating point instructions (vectorization) |
| 13 |
Floating point instructions (single precision) |
| 14 |
Floating point instructions (double precision) |
| 15 |
L3 cache |
| 16 |
L3 cache, core-level reads |
| 17 |
L3 cache, core-level misses |
| 18 |
L3 cache, core-level fills caused by L2 evictions |
| 19 |
Prefetches |
Individual counters can also be accessed. See Appendix A of http://docs.cray.com/books/S-2376-41/S-2376-41.pdf for more details.
It is also possible to focus on a certain section of a code (a loop nest, for example) by using the CrayPat hwpc library. In this approach the user must insert library calls to the code to start and stop collecting counter data. For details, see Section 4 of http://docs.cray.com/books/S-2376-41/S-2376-41.pdf .