“The positive effects of the FinHPC project stem primarily from national collaboration in developing application programs for science. FinHPC was a joint project between several universities and CSC. The HPC-lab of Åbo Akademi University played a very significant role in code optimization. In addition to optimization, the project arranged training courses in parallel programming, code optimization, and use of software development tools,” says the project manager, Jan Åström from CSC – IT Center for Science, Ltd.
“At the beginning of the project we collaborated at the national level to map the needs for software optimization. Based on the survey, we selected the codes to be tested for optimization and established relationships with the software copyright owners. In the selection we prioritized the importance of the code, optimization need, and expected benefits,” says Åström.
Approximately 30 scientific programs were processed in the FinHPC project. Most of them have been created to serve researchers’ own needs. Since scientists usually lack in-depth knowledge of programming, programs have not been able to utilize the full capacity offering of the new supercomputers. Hence, significant improvements adding to the performance capacity were made to most of the programs during the FinHPC project.
Optimization is still continuing
The FinHPC project and its results have attracted an enormous amount of international attention. Even before the project was completed, project members have been invited to join similar EU projects, such as Euforia and PRACE.
Thanks to FinHPC, the recognition of CSC and Finland, too, has risen to a new level. The project has been introduced at international conferences, and additionally, a number of training events have been arranged. Project members have written 13 scientific articles and a few more general articles.
Participation in this project was also especially important for the HPC-lab of Åbo Akademi University, because it helped the group to build its profile as a center of excellence in fusion plasma simulation. It heads the Euforia program, which is responsible for the parallelization and optimization of 12 fusion reactor simulation codes.
The tangible outcome of the FinHPC project was 27 improved codes or code libraries. In addition, a benchmark set for load testing was also created, and it was used at the acquisition phase of CSC’s new supercomputer. The project staff organized training and participated in international conferences and workshops. The project was initiated at the beginning of April 2005 and completed at the end of 2008.
Optimization of scientific programs is still being continued at the European level, including in the PRACE project. One of the working groups concentrates on software optimization and another in benchmark load testing programs. CSC was well represented in both working groups.
The project has increased HPC expertise
The HPC group at the Department of Information Technologies of Åbo Akademi University was one of the participants in the FinHPC project. “Thanks to the project, our department has been able to build up the skills of high-performance computing and create international relationships”, says Jan Westerholm. We acquired several new programs and made significant improvements to almost all of them.
“In some cases, the code was optimized. For example, the running time of a physical and chemical correlation analysis program was shortened by a factor of 112, and the memory use was reduced by a factor of 88. In this case, we were able to give up the original idea of parallelizing the program, because the running time was reduced to less than two seconds. Some of the studied programs worked well with a single processor, but they needed optimization in order to function better in a parallel environment,” Westerholm explains.
Artur Signell at Åbo Akademi University’s HPC-lab succeeded in improving the parallel performance of ELMFIRE program during the project. “ELMFIRE was developed in cooperation by VTT Technical Research Centre of Finland and Helsinki University of Technology (HUT), and it is intended to simulate high-temperature plasmas in a toroid-shaped space. Particles move in a magnetic field within the toroid and randomly bump into each other or the toroid walls. Their movement patterns (trajectories) are integrated as a function of time”, Signell explains.
“The electromagnetic interactions of the particles are taken into account by examining each particle as the source of charge for electrostatic voltage when particles traverse the field. ELMFIRE worked parallelized even before the FinHPC project, but the researchers wanted to make it scalable for larger masses of particles. The idea is to simulate a future ITER test reactor being designed and built at the moment in Cadarache in Southern France,” says Signell.
The current trend is to design platform-independent source codes that are translatable in most systems merely by using widely available, preferably open-source code software libraries. ELMFIRE used commercially available numerical libraries for translation of matrices and generation of random numbers. These programs were replaced by open-source code routines. The GNU Scientific Library (GLS) code was selected for generating random numbers and PETSc (Portable, Extensible Toolkit for Scientific computation) for matrix processing. Thanks to these changes the program could be run in CSC’s Opteron-based parallel environment, which doubled the performance power.
The HPC group at the Department of Information Technologies of Åbo Akademi University was one of the participants in the FinHPC project. Researcher Artur Signell (on the right) was able to make significant improvements to the ELMFIRE code used in fusion plasma modeling. Professor Jan Westerholm thinks that the FinHPC project has enabled the HPC group to increase its high-performance computing skills.
Optimization phases
The program being checked is first profiled at the HPC-lab of Åbo Akademi University. At this phase the time spent on running the different parts of the program are timed. It is often the case that programs consume most time on just a few loops, but ELMFIRE seemed to use time evenly for the different subprograms. This indicated that code optimization would not shorten the run time very much. Even if the run time of a subsection were to be halved, the total time would be reduced by only a fraction of that half, which would not represent a return on the time invested in the optimization.
Next, two scalability tests were performed for the program. First, the test job remained the same but the number of processors was doubled several times to see whether the run time could be halved. In the second scalability test both the number of processors and the size of the job, i.e. the number of grid points, were doubled, and it was thus expected that the running time would remain the same.
Shortage of memory is a stumbling block
In the first case the program behaved as expected, but in the second one it crashed due to lack of memory. The tailored subprogram designed by Artur Signell showed that the usage of memory was not scaled properly. The use of an individual processor memory was reduced only slightly when the number of processors used in the simulation was increased. The memory usage increased in proportion to the number of grid points in the torus, and therefore it was impossible to get results with a condensed grid network, because the processors ran out memory.
A more exacting memory usage profile showed that the problem was related to the matrix used for saving intermediate data and to transmittance of the data to the processors. There were several elements in the matrix for each grid point being simulated. The data represented the electrostatic voltage at each grid point and it was read to the matrix at every time step. At the end of the time step the data was transferred to the processors and the buffer matrix was cleaned. Some of the matrix elements contained data from several particles, but most of the elements were zeros and had not been used at all, which meant an enormous waste of memory.
The matrix was replaced by a buffer, which was created by using search trees and a hash table. This buffer is dynamic and works in the sparse matrix representation. Memory usage can be adapted to the available memory capacity. When the buffers are kept small, all data cannot be collected and saved from every time step, but since the data is transient and being passed on to other processors, the buffer contents can be transmitted to the processors and cleaned as soon as they become full.
Using dynamic buffers might demand extensive resources, if the entire buffer were to require processing whenever data were being saved or updated. Using a hash table makes the search for correct elements efficient.
The new version of ELMFIRE developed during the FinHPC project is able to carry out considerably more detailed simulations than the previous version. Memory capacity is no longer a constraint for the number of grid points being used. Increasing the number of grid points makes it possible to simulate larger toroidal geometries, while keeping the grid point density the same.
Hence, the new version can be used to simulate larger reactors than the reactor size handled by the previous version. This is a huge step forward towards the simulation needed in the ITER reactor. ELMFIRE is one of the twelve fusion plasma simulation codes that have been adapted for different supercomputers around Europe as part of the pan-European Euforia (EU Fusion for ITER Applications) project.
Paavo Ahonen
Additional information
FinHPC Final Report in English (PDF)