Tehdyt toimenpiteet

Gromacs squeezing 1.1 Tflops from Cray XT4 using 384 cores

CSC organised a Gromacs Workshop 25.2.-1.3.2007 at CSC's premises together with the Gromacs developers Erik Lindahl, David van der Spoel, and Berk Hess. The workshop was attended by almost 60 participants from all over the world. During the workshop the new parallellisation scheme implemented in the development version of Gromacs lead to breaking the 1 Tflops barrier for the first time.

Gromacs has been the fastest molecular dynamics code in serial or parallel runs with some tens of processors due to highly optimised code and in particular inner force loops which have been coded in assembly and utilise the SSE instructions. However, now it has been shown that in a modern supercomputer equipped with a very fast interconnect (the Cray Seastar2) Gromacs also scales to hundreds of processors. During the course Gromacs achieved sustained performance of 1.1 Tflops using 384 cores of actual Gromacs throughput computation which amounts to 48 ns/day. The benchmark system was a box of 108000 SPC water molecules, and the long range interactions were dealt with using reaction-field for electrostatics with cut-off of 1.2 nm.

In some cases using cut-offs for the electrostatics is an unsuitable approximation. However, the Particle Mesh Ewald (PME)-scheme for accurately accounting for the electrostatics now also scales to hundreds of processors in Louhi. This was demonstrated with a a lipid bilayer system of 4096 lipids, which together with the water molecules totals 487424 atoms (the benchmark DPPC-system times 4). Electrostatics were treated with PME using a cut-off of 1.8 nm and 1.0 nm for vdW. Using 1056 cores to simulate this system extracted 1.15 Tflops from Louhi providing 23 ns/day of simulation.

For more information:

- GROMACS