Louhi User's Guide, the 2nd Edition > Louhi hardware > Hardware overview
Tehdyt toimenpiteet

Hardware overview of Louhi

This section gives an overview of the Louhi hardware.

Louhi is a Cray XT4/XT5 Massively Parallel Processor (MPP) supercomputer. 

Louhi has a theoretical peak performance of Rpeak = 102.2 Tflops/s, not counting the service nodes handling logins, I/O, etc., and the estimated Linpack performance of  about Rmax = 76.5 Tflop/s. The architecture of Louhi is optimized especially for massively parallel jobs. More information on the Linpack test and how it is used to rank the supercomputers of the world can be found from the TOP500 website (http://www.top500.org/).

The  Cray XT4 facility contains 4048 compute cores in 1012 compute nodes, which are located in 11 cabinets.  Each XT4 compute node contains one quad-core processors and thus 4 cores. The Cray XT5 part contains 6816 compute cores in 852 compute nodes, which are located in 9 cabinets.  Each XT5 compute node contains two quad-core processors and thus 8 cores (see the Figure 1 in the chapter Compute nodes).  Each processor and its memory consist the NUMA (NonUniform Memory Access)  node. Thus the XT5 compute node contains two NUMA nodes.  More information: Compute nodes

The processors of XT5 and XT4 are quad-core 2.3-GHz AMD Opteron 64-bit (Barcelona or AMD Family 10h) processors except in two XT5 cabinets (180 compute nodes, 1440 cores) belonging to the PRACE (Partnership for Advanced Computing in Europe) which are quad-core 2.7-GHz AMD Opteron 64-bit (Shanghai) processors. There is dedicated memory of 1 GB  or 2 GB per core. The architecture of the quad-core Opteron processors is well suited for the floating-point computation and memory traffic requirements common in high performance computing, making good sustained performance possible. The performance is enhanced by using the Compute Node Linux operating system in the compute nodes which strips the overhead of the operating system to the minimum. More information: Processor architecture

The heart of the performance for massively parallel runs lies in the interconnection network between the processors. In Louhi, the interconnect is Cray SeaStar2 communication system which connects the processors in a three-dimensional torus, providing a bandwidth-rich environment.

It can be said that the XT4 part is better for jobs that need a balance of compute power and interconnection bandwidth. Because in the XT4 part less number of cores share a same interconnect link it is clear that large message intensive distributed memory tasks are more suitable for the XT4 part. If a MPI job is large but do not need to communicate with other nodes as much it is suitable for the XT5 part. Also for XT5 part more memory intensive and mixed distributed/shared memory jobs (like OpenMP threads inside a node and MPI between nodes) fit very well.

The new operating system version 2.1 supports NUMA kernel. It enables Non Uniform Memory Access inside a compute node. This minimizes traffic between sockets on Cray XT5 compute nodes by using socket local memory whenever possible.

In addition to compute nodes, Louhi has also 22 service nodes located to two XT4 cabinets (c0-0 and c0-1), running SuSE Enterprise Linux, used for login, I/O, boot and other service usage. Louhi has 70 TB of local fast disk reserved to be used as temporary work space and for user applications.

There are three critical differrencies between compute nodes: the node is of the XT4 or XT5 type, the memory per core (and thus per node), and the clock cycle of the processor is 2.3 or 2.7 GHz (PRACE nodes). Mixing the different kind of compute nodes in a job may cause problems when jobs are sent to execution. See: Cray XT system

The 20 cabinets are physically on the floor of the computer hall in two rows, 10 cabinets in each row. See: Cray XT system