New Computing and Data infrastructure: Puhti, Mahti and Allas
New Computing and Data infrastructure: Puhti, Mahti and Allas
Scientists have made use of the CSC supercomputers going back some 30 years. Since 70s’, CSC and its predecessor have hosted Finland’s fastests computers, starting with Univac 1108 in 1971, Vax 8600 in 1985 and finally the first supercomputer, the Cray X-MP, which was taken into use in the autumn of 1989.
Taito and Sisu were originally installed in 2012, and their computing power was improved with a major update in 2014. It has been 5 years since then, which is nearing the standard retirement age for a supercomputer. Due to continuous improvements in the efficiency of processors and other components, the same computing power can be achieved with significantly smaller hardware, which also consumes just a fraction of the power. On the other hand, significantly more computing power and storage space can be achieved using the same amount of power.
In 2015, CSC began to prepare its next update. The first task was to determine what needs and visions scientists had. What kinds of resources and how much of them will be needed in the future? We engaged in dialogue, conducted user surveys, held workshops in just about every Finnish university and interviewed top scientists. The report showed that there was a need for new infrastructure, with data and its use playing a particularly important role.
Together with research and innovation actors, the Ministry of Education and Culture launched the Data and Computing 2021 development programme (DL2021). In the development programme, EUR 33 million in funding was granted to the procurement of a new computing and data management environment, in addition to which the Finnish Government granted EUR 4 million from the supplementary budget for the development of artificial intelligence.
Supercomputer Puhti (2019). Photo: Mikael Kanerva, CSC.
The new hardware will serve six primary purposes:
1) Large-scale simulations: This group represents traditional high performance computing (HPC). These are utilized in physics and in various related fields. Challenging scientific questions are studied by massive computing, for example by high-precision simulations of nuclear fusion, climate change, and space weather.
2) Medium-scale simulations: This category covers a large part of the usage of the computing resources provided by CSC. These simulations include a wide range of disciplines, ranging from topics like biophysical studies of cell functions to material science and computational fluid dynamics. For this type of simulations, it is particularly important to enable workflows that allow a large number of simulations and provide efficient means to handle the resulting data. The created data requires efficient analysis methods utilizing data-intensive computing and artificial intelligence.
3) Data-intensive computing: This use case covers analysis and computing with big data based on extensive source material. The largest group of data-intensive computing users at CSC are currently the bioinformaticians. Other large user groups include language researchers and researchers of other digital humanities and social sciences.
4) Data-intensive computing using sensitive data: Research material often contains sensitive information that cannot be disclosed outside the research group and is governed by a number of regulations, including the Personal Data Act and, from May 2018 on, the EU General Data Protection Regulation. In addition to the needs of data-intensive research in general, managing sensitive data requires e.g. environments with elevated data security and tools for handling authentication and authorization. Some examples include biomedicine dealing with medical reports and humanities and social sciences utilizing information acquired from informants and registries.
5) Artificial intelligence: Machine learning methods are applied to many kinds of scientific challenges, and their use is rapidly expanding to various scientific disciplines, including life sciences, humanities and social sciences. Machine learning is typically applied to analysis and categorization of scientific data. Easy access to large datasets, like microscope images and data repositories, is crucial for the efficiency of the artificial intelligence workload.
6) Data streams: Many important scientific datasets consist of constantly updated data streams. Typical sources for these kinds of data streams include satellites with measuring instruments, weather radars, networks of sensors, stock exchange prices, and social media messages. Additionally, there are data streams emanating from the infrastructure and between its integrated components
Supercomputers Puhti and Mahti
Two independent systems will provide computing power for CSC in the future: Puhti and Mahti.
Puhti is a supercomputer, which is intended to support many of the above-mentioned purposes. It offers 664 nodes for medium-sized simulations with plenty of memory (192 GB or 384 GB) and 40 cores, which represent the latest generation of Intel Xeon processor architecture. These nodes are combined with an efficient Infiniband HDR interconnect network, which allows for the simultaneous use of multiple nodes. Some quantum chemistry applications benefit a great deal from fast local drives, which are found in 40 nodes. The same nodes can be used for data-intensive applications, in addition to which the supercomputer has 18 large-memory nodes that contain up to 1.5 TB of memory.
One of the hottest topics right now is artificial intelligence. In science, its use is constantly increasing in both data processing and as part of simulations. With regard to this, Puhti has accelerated partition, Puhti-AI, which contains 80 GPU nodes, each of which has four Nvidia Volta V100 GPUs. These nodes are very tightly interconnected, thus allowing simulations and artificial intelligence work using multiple nodes to get as much out of the GPUs as possible. Majority of current machine learning workloads use only one GPU, but the trend is toward larger learning tasks. The new hardware makes it possible to use multiple nodes at the same time. The new Intel processors (Cascade Lake) also include new Vector Neural Network Instructions (VNNI), which accelerate inference workloads by as much as a factor of 10. The supercomputer work disc is 4.8 PB.
In the procurement of Puhti, CSC and the Finnish Meteorological Institute (FMI) collaborated to extend Puhti with a dedicated research cluster for the FMI. This 240 node partition is fully funded by the FMI and is logically separated from the main Puhti system while the hardware is fully integrated. In total this means that in the joint machine has 1002 nodes.
Mahti is being installed in the Kajaani Datacenter in the same room where Sisu was. Unlike Puhti, Mahti is fully liquid cooled. In terms of datacenter technology, the new supercomputer is a major improvement over Sisu. Mahti's liquid cooling system uses warm water (just under 40 degrees) as opposed to Sisu, which required cooled water. As a result, Mahti can be cooled more affordably and efficiently. Mahti is a purebred supercomputer containing almost 180 000 CPU cores in 1404 nodes. Each node has two next-generation AMD 64 core processors (EPYC 7H12)running at 2.6 GHz, making the theoretical peak power of the whole system 7.5 Pflops. This version of the AMD EPYC processor is the fastest CPU currently available, and will give Finnish science a unique competitive advantage. There is 256 GB of memory per node, so even large scale simulations requiring a large amount of memory can be run effectively. The supercomputer work disc is over 8 PB.
- 682 nodes, with two 20-core Intel Xeon Gold 6230 processors, running at 2.1 GHz
- Theoretical computing power 1.8 Pflops
- 192 GB - 1.5 TB memory per node
- High-speed 100 Gbps Infiniband HDR interconnect network between nodes
- 4.8 PB Lustre parallel storage system
- 80 nodes, each with two Intel Xeon Gold 6230 processors and four Nvidia Volta V100 GPUs
- Theoretical computing power 2.7 Pflops
- 3.2 TB of fast local storage in the nodes
- High-speed 200 Gbps Infiniband HDR interconnect network between nodes
- 1404 nodes with two 64 core AMD EPYC processors (Rome) running at 2.6 GHz
- Theoretical computing power 7.5 Pflops
- 256 GB of memory per node
- High-speed 200 Gbps Infiniband HDR interconnect network between nodes
- 8.7 PB Lustre parallel storage system
Allas data management solution
Growth in the volume of data and the need for different approaches to sharing it also pose new challenges for data management. A file system based on a conventional directory hierarchy does not fully meet future needs where, for example, the scalability of storage systems and the sharing and re-use of data are concerned.
Allas is CSC's new data management solution, which is based on object storage technology. The 12 PB system offers new possibilities for data management, analysis and sharing. Data is stored in the system as objects, which for most users are just files. As opposed to a conventional file system, files can be referred to other ways than by their name and location in the directory hierarchy, as the system assigns a unique identifier to each object. In addition to this, an arbitrary metadata can be added to each object, thus allowing for a more multifaceted description of the data.
Data stored in Allas is available on CSC's supercomputers and cloud platforms as well as from any location over the Internet. In the simplest case user can add and retrieve data on their own computer just through a web browser. Allas also facilitates the sharing of data, as the user is able to share the data they choose with either individual users or even with the whole world. Allas also offers a programming interface, which can be used to build a wide variety of services on top of it.
One example of the new use cases is data (possibly even very high volume) generated by an instrument, which can be streamed directly to Allas. The data can then be analyzed using CSC supercomputers, and the results can be saved back to Allas, from which it is easy to share the results with partners.
Data management system Allas (2019). Photo: Mikael Kanerva, CSC
A broad spectrum of scientific problems in pilot projects
During the Puhti supercomputer acceptance phase, a limited number of Grand Challenge research projects were given an opportunity to use the extremely large computing resources. An effort was made to take the various computing needs behind the supercomputer procurement into account when selecting pilot projects. The selected projects varied from conventional, large-scale simulations to research conducted using artificial intelligence, and the researchers studied a wide range of topics from astrophysics to personalized medicine. The rise of AI as a part of the workflow was a big trend, and 61% of all resources were used by projects which had, or planned to have, AI as a part of their work.
Pilot period was very successful in testing the system. The projects were able to generate very high load on the system and thus confirm that the system was usable with real workload. Several projects were also able to make significant progress in their research during the piloting period. Due to testing nature of the acceptance phase some projects, however, faced technical problems but also these experiences were very important to CSC since it helps CSC to improve the functionality of the system. In successful projects the performance of Puhti was generally a bit better when compared to Sisu, both in terms of parallel scalability and in terms of single core performance.
A new group of Grand Challenge pilot projects will be selected at the end of 2019 for the acceptance phase of the Mahti supercomputer. We look forward to see what kinds of scientific challenges await!
The Puhti supercomputer has been opened to customers in 2.9.2019 and Allas data management solution in 2.10.2019. Researchers working in Finnish universities and research institutes may apply for access rights and computing resources on the CSC Customer Portal at https://my.csc.fi.
Software offering in Puhti is currently more limited than in Taito, but new software is being installed almost on a daily basis. Also the user documentation is continuously extended. CSC will also be organizing several training sessions on the use of the environment for both new and experienced users in 2019 - 2020, the first Puhti porting and optimisation workshop has already been held.
CSC supercomputers and superclusters
1989 Cray X-MP
1995 Cray C94
1997 Cray T3E
1998 SGI Origin 2000
2000 IBM SP Power3
2002 IBM p690 Power 4
2007 Cray XT4 (Louhi)
2007 HP Proliant CP400 (Murska)
2012 Cray XC40 (Sisu)
2013 HP Apollo 6000 XL230a/SL230s Supercluster (Taito)
2019 Atos BullSequana X400 (Puhti)
(2020 Atos BullSequana XH2000 (Mahti))
Sebastian von Alfthan and Jussi Enkovaara are high performance computing experts at CSC.
You might have heard news about LUMI, the European pre-exascale computer that will be hosted by CSC. LUMI will be huge addition to computational resources available to Finnish researchers from 2021 on, but we will come back to the story of LUMI later on.
Published originally 30.09.2019.
Dr Sebastian von Alfthan is the manager of the HPC support group at CSC. Follow him on Twitter: @SvAlfthan
The author works as HPC specialist at CSC.