HPCEuropa2 workshop on application porting and optimization
The goal of this workshop is that, after the course, participants will have their codes ported, compiled, profiled, tuned and ready for efficient production runs. The platform used for the course will be CSC's flagship machine, a Cray XT4/XT5 Louhi. The presented strategies are applicable to other platforms as well.
The course consists of a detailed introduction to the following topics:
- Compiling and porting codes to the Cray XT4 and Cray XT5
- Profiling and optimizing parallel codes
- Program development tools (debugging, HW counters) and libraries on Louhi
Roughly half of the course will be devoted to hands-on work on participants' own codes. We encourage everybody to bring their own laptop if possible. Attendees should preferably have applied for their own user account on Louhi before the course, however training accounts will also be provided. The lectures will be broadcast as web stream, hence you may attend the course virtually. You may also participate in the hands-on sessions remotely over an SSH connection.
Program
09:00 – 09:15 John Levesque Introduction (flash video 39:09)
09:15 – 09:45 John Levesque Overview of the Cray XT Architecture
i. Node architecture
ii. Interconnect
09:45 – 10:15 Kevin Roy The Cray Linux Environment (flash video 46:24)
i. Overview
ii. CLE features
iii. CLE Programming
iv. The storage environment
10:15 – 10:30 Break
10:30 – 11:15 Luiz DeRose Programming Environment for the Cray XT system (flash video 1:05:19)
i. Overview
ii. Modules
iii. Compilers (CCE and PGI
iv. PE User Guide
11:15 – 12:15 John Levesque A methodical approach for analyzing applications (flash video 21:44)
i. Formulating a problem
ii. Potential bottlenecks
• Memory hierarchy (TLB & cache)
• Load imbalance
• Computation
• Communication
• I/O
12:15 – 13:30 Lunch
13:30 – 14:15 Heidi Poxon Performance measurement on the Cray XT system (flash video 32:26)
i. Overview
ii. Automatic Profiling Analysis
14:15 – 14:45 Luiz DeRose Using Hardware Performance Counters (flash video 43:47)
14:45 – 15:00 Break
15:00 – 15:30 Kevin Roy Job launching & running a batch application (flash video 23:21)
15:30 – 17:00 Hands on Lab Profiling applications
22/09/09 - Second Day (We will finalize the presentations on how to
identify performance bottlenecks. The attendees will use Cray
Apprentice2 for performance visualization and will learn various
optimization techniques. The attendees will start to tune their
applications at the hands on lab)
09:00 – 9:30 Heidi Poxon Profile visualization with Cray Apprentice2 (demo with hands on lab) (flash video 28:46)
09:30 – 10:20 Luiz DeRose Load Imbalance Analysis (flash video 50:22)
i. MPI Sync Time
ii. Imbalance metrics
iii. MPI Rank reorder
10:20 – 10:35 Break
10:35 – 11:00 Luiz DeRose How to make the best use of Cray MPI on the XT (flash video 23:33)
11:00 – 12:15 John Levesque Optimization techniques (flash video 1:09:19)
i. Addressing memory hierarchy problems
• Cache optimization
• TLB Optimization
ii. Vectorization
12:15 – 13:30 Lunch
13:30 - 13:45 Machine room visit
13:45 - 14:15 John Levesque Optimization techniques (continued) (flash video 27:47)
iii. Pre-posting receives
iv. Resolving scaling issues
• Load imbalance
• Communication
• Computation
14:15 - 14:30 Break
14:30 – 17:00 Hands on Lab Tuning applications
17:00- SOCIAL EVENT: sauna + refreshments
23/09/09 - Third Day (The attendees will learn advanced techniques to
deal with scaling problems and how to access the on-line documentation
for user help. In the hands on lab the attendees will continue to tune
their applications.)
09:00-09:15 Heidi Poxon Performance analysis of OpenMP Applications (flash video 10:36)
09:15 – 10:15 John Levesque Using OpenMP to mitigate scaling problems (flash video 39:42)
i. Scaling higher with OpenMP
ii. Improving Load imbalance with OpenMP
10:15 – 10:30 Break
10:30 – 11:00 Luiz DeRose Trace analysis (flash video 22:51)
i. Trace file generation
ii. Trace visualization
11:00 – 11:15 Heidi Poxon User help (flash video 17:33)
i. Using man pages
ii. Using pat_help
iii. Using Cray Apprentice2 on-line documentation
11:15 – 12:00 John Levesque CAF & UPC (flash video 56:42)
12:00 – 13:30 Lunch
13:30 – 17:00 Hands on Lab Tuning applications
24/09/09 - Fourth Day (We will cover additional topics and the attendees will complete the tuning of their applications)
09:00 – 09:30 Luiz DeRose Libraries
i. Cray Scientific Libraries (flash video 25:53)
ii. Debugging Tools (flash video 21:31)
10:00 - 10:05 Heidi Poxon Fast Track Debugging (flash video 02:41)
10:10 – 11:00 Kevin Roy I/O Optimization (flash video 53:50)
11:00 – 12:00 Hands on Lab Tuning applications
Materials
- Roy: Cray Linux Environment
- DeRose: Using hardware performance counters
- Poxon: Profile visualization with Apprentice2
- Roy: Running on XT compute nodes
- Levesque: Removing bottlenecks to high performance
- Levesque: A methodical approach to scaling to large numbers of cores
- DeRose: How to make the best use of the MPI in the Cray XT system
- Workshop flyer
- Poxon: Performance measurement on the Cray XT system
- DeRose: Programming environment for the Cray XT system
- Levesque: XT# hardware architecture
- Roy: Achieving I/O performance
- DeRose: Cray scientific libraries
- DeRose: Cray debugging support tools
- DeRose: Trace analysis
- Levesque: Using OpenMP to remove scaling bottlenecks
- Levesque: Using Co-array Fortran
- Poxon: Documentation for the Cray Performance Toolset
Registration
Additional information
Further information is provided by Pekka Manninen (pekka.manninen(at)csc.fi, +358 50 3819 039).
HPCEuropa2: EC-funded collaborative research visits and access to some of the Europe's biggest supercomputers. For scientists of all levels, in all disciplines from all EU states. http://www.hpc-europa.eu
