CSC Blog has moved

Find our blogs at www.csc.fi/blog.

This site is an archive version and is no longer updated.
 

Go to CSC Blog
RSS

Marraskuun 2019 lopussa uusi koronavirus alkoi tarttua ihmisiin Kiinassa. Virukselle annettiin nimeksi SARS-CoV-2 (Acute Respiratory Syndrome-related coronavirus 2) ja WHO alkoi kutsua viruksen aiheuttamaa sairautta nimellä COVID-19.

Iso-Britannian Imperial Collegen ja kotimaisen THL:n mallitusten mukaan koronaviruksen suurin aalto on tulossa Eurooppaan huhti-toukokuussa. Aalto alkaa vyöryä jo. Italia raportoi noin 500 ihmisen kuolleen yhden vuorokauden aikana koronaviruksen aiheuttamiin komplikaatioihin. 

Tutkimus on tunnistanut yli tuhat ihmiseen vaikuttavaa virusta. Moni näistä aiheuttaa infektion – elimistön normaalin biokemiallisen puolustusreaktion. Koronaviruksen kaltaisen vakavan infektion syyn ja taudinaiheuttajan ilmaantuminen oli siten vain ajan kysymys. Ihmiset elävät silmälle näkymättömien mikrobiologisten elämänmuotojen viidakossa. Eihän ole kuin pari sukupolvea, kun tuhkarokon ja isorokon kaltaiset virukset sairastuttivat kymmeniä tuhansia ihmisiä saman tapaan joka vuosi.

Tällä hetkellä terveydenhuollolla ei ole lääketieteellisiä työkaluja kuten rokotteita ja lääkeaineita koronavirusta vastaan. Keinoina käytetäänkin nyt samoja konsteja kuin sata vuotta sitten espanjantaudin tapauksessa: koulut ja huvittelupaikat suljetaan ja ihmisten liikkumista rajoitetaan. 

Isorokon aiheuttaja on variolavirus. Säännölliset rokotukset isorokkoa vastaan alkoivat 50-luvulla, ja tauti on onnistuttu käytännössä hävittämään ihmiskunnasta. Virusten toiminnan tutkimus ja hoitokeinojen kehitys ovat pitkäjänteistä toimintaa. Kun etsimme rokotetta ihmishenkiä uhkaavia ja yhteiskuntia lamauttava uusia viruksia vastaan, voimme vain nojautua olemassa olevaan, jo kerrytettyyn tutkimustietoon.   

Perustutkimus, jonka avulla kriiseihin pystytään varautumaan, vaatii rahoitusta. Terveyskriiseihin ei pysty reagoimaan ketterällä on-demand bisneslogiikalla. Infrastruktuurin rakentamista ei voi aloittaa siinä vaiheessa, kun kriisi on käynnissä.

Tutkimusinfrastruktuurit ovat virustiedon säilytys- ja jakelupaikkoja. Nämä ekosysteemit koostuvat laitteistoista, tietoverkoista, tietokannoista, aineistoista ja palveluista. Ne muodostavat globaalin tiedon vaihtamisen verkoston ja mahdollistavat eri vaiheissa tapahtuvan, kansalliset rajat ylittävän tutkimusyhteistyön.

Tiedon vaihto täytyy olla luotettavaa. Kokemuksesta hyvä toimija on tutkimusinfrastruktuuri, joka kerää, ylläpitää, säilyttää ja yhdistää biologisen ja lääketieteellisen tutkimuksen tuottamaa aineistoa ja sen tarvitsemaa dataa. Dataa ovat esimerkiksi molekyylibiologinen tieto sekä lääkeaineiden rakenteet, toiminta ja turvallisuus. 

Luotettuja  kansainvälisiä tutkimuksen tietokantoja ovat esimerkiksi European Nucleotide Archive (ENA), josta koronaviruksenkin genomi on saatavana, sekä Universal Protein Resource (Uniprot), johon kerätään proteiinien, solun osien ja eliöiden toimintojen dataa.

Kun tutkijat suunnittelevat rokotetta koronavirukseen, he käyttävät avoimia biologisen tietojen tietokantoja ja dataintensiivistä laskentaa. CSC osallistuu näihin talkoisiin. Se on avannut koronaviruksen tutkimusta helpottavan ohituskaistan tutkijoille, mikä mahdollistaa pääsyn superlaskentaan ja kansalliset rajat ylittävän datan hallintaan (European Data Space, Digital Europe).

Nopean, tietoon pohjautuvan päätöksenteon ja reagoinnin edellytys on, että tiedosta vastaavat tahot (kuten rekisterinpitäjät) tekevät tiedosta yhteentoimivaa ja koneluettavaa. Tietoja kerätään mm. tilastoinnin, terveydenhuollon, tutkimuksen ja päätöksenteon tarkoituksiin, mutta tätä dataa tulisi voida kriisitilanteessa hyödyntää myös alkuperäisestä tarkoituksesta poikkeavaan toisiokäyttöön, kuten  tieteelliseen tutkimukseen. Tätä säätelee niin sanottu toisiolaki, joka tuli voimaan keväällä 2019.

Miten varautuisimme jatkossa koronaviruksen kaltaisiin tsunameihin?  Jatkuva tiedonkeruu, tutkimus ja datainfrastruktuurin ylläpito virusten ja bakteerien ekosysteemeistä olisi parasta riskienhallintaa. Löydettyihin viruksiin voitaisiin ennaltaehkäisevästi kehittää rokoteaihioita ja lääkeaineita, jolloin kriisin puhjetessa terveydenhuollon hoitovarustelun kehitys olisi rivakampaa. Loppujen lopuksi, satojenkin miljoonien investoinnit tähän infrastruktuuriin ja osaamiseen tuntuvat nopeasti nousevan kuolleisuuden ja syvän, pitkäkestoisen globaalin talouskriisin valossa lähes taskurahoilta.


Lisätietoja:

CSC tarjoaa resursseja COVID-19-pandemian vastaiseen tutkimukseen

CSC:n varautuminen koronaviruksen aiheuttamaan poikkeustilanteeseen

Koronavirusskenaariot seuraavalle 18 kuukaudelle

Uusia COVID-19 SARS-CoV-2 tutkimuksia

Ihmiseen tarttuvat virukset

UK Imperial college COVID-19 leviämisennuste

Rokotteilla hävitetyt taudit  

Lääkkeitä laskemalla

COVID-19 proteiinin uusia lääkeaihioita laskennallisilla menetelmillä

Euroopan bioinformatiikan infrastruktuuri ELIXIR

Kuva: Adobe Stock
Taulukko: Picture modified from publicly shared Imperial College COVID-19 Response Team article from https://www.imperial.ac.uk/

 
“Tommi Nyrönen

Tommi Nyrönen

Dr. Tommi Nyrönen leads a team of experts in the European Life Science Infrastructure for Biological information ELIXIR at CSC.

tommi.nyronen(at)csc.fi

 

Twitter: @nyronen
Linkedin: https://www.linkedin.com/in/nyronen
puh. +358503819511

 

Blogger: Tommi Nyrönen Blog Topic: Data HPC Science and research

At the end of November 2019, a new type of virus began to infect humans in China. The virus was named SARS-CoV-2 (acute respiratory syndrome-related coronavirus 2), and WHO started calling the viral illness COVID-19.

According to the models produced by Imperial College London and the Finnish Institute for Health and Welfare, the biggest wave of coronavirus will hit Europe in April and May. The impacts of the wave are already being felt. It has been reported that in Italy, about 500 people have died of the complications from coronavirus within a period of 24 hours.

Scientists have identified more than one thousand different viruses affecting humans. Many of them cause infections, which are normal biochemical reactions of the human body to the infectious agents. Thus, it was only a matter of time before a pathogen causing a serious infection such as coronavirus would appear. We are surrounded by a vast variety of microbiological life forms that are invisible to human eye. In fact, only two generations ago, tens of thousands of people were infected with viral diseases, such as measles and smallpox, every year.

There is still no treatment or vaccine for coronavirus and the measures now used are the same as during the Spanish influenza pandemic 100 years ago: schools and places of entertainment are closed and there are restrictions on people’s movements.

Smallpox is caused by the variola virus. Regular vaccinations against smallpox began in the 1950s and the disease has been practically eradicated from the world. Studying the behavior of viruses and developing treatments are long-term activities. When we are developing vaccines against new viruses threatening human lives and paralyzing societies, we rely on existing information produced by research.

Basic research helping us to prepare for crises requires sustained funding. Reactions to health crises cannot be based on agile on-demand business logic and we cannot start building the infrastructure when the crisis is already on.

Research infrastructures are places where virus data is stored and where it is available. These ecosystems consist of hardware, information networks, databases, documents and services. They form a global information exchange network and provide a basis for research cooperation at different stages across national borders.

The exchange of information must be carried out in a reliable manner. Experience has shown that a specialized organization, research infrastructure that collects, maintains, stores and combines findings produced by biological and medical research and data is a key actor in the overall process.. This data includes molecular biological information and the structures and functioning of medical substances as well as their safety.

Reliable international research databases include the European Nucleotide Archive (ENA), where the coronavirus genome is also available, and Universal Protein Resource (UniProt), which collects data on the functions of proteins, cell parts and organisms.

When researchers are developing a vaccine against coronavirus, they use open biological databases and data-intensive computing. CSC is a partner in this effort. It has opened a priority lane facilitating coronavirus research and provides access to supercomputing and management of data across national borders (European Data Space, Digital Europe).

A prerequisite for fast, knowledge-based decision-making and reactions is that the parties responsible for information (such as data controllers) make the information interoperable and machine-readable. Data is collected for such purposes as statistics, healthcare and decision making, but in a crisis, such data should also be made available for secondary purposes that differ from the original purpose (such as scientific research). Provisions on such applications are contained in the Finnish act on secondary uses of social and health data, which entered into force in spring 2019.

How should we prepare for future tsunamis that have similar impacts as coronavirus? Systematic collection of data, research, and maintenance of a data infrastructure containing information on viral and bacterial ecosystems would be the best way to manage risks. Investigational vaccines and medical substances could be developed more consistently to modulate discovered pathogens to be able to respond more quickly to a crisis. In the final analysis, even if we spent hundreds of millions of euros on this infrastructure and expertise, these sums would only be a fraction when compared with rapidly rising mortality rates and a prolonged and severe global economic crisis.

Further information:

Coronavirus scenarios for the next 18 months 

CSC offers resources for efforts against COVID-19 pandemic

CSC's preparations for the exceptional situation caused by coronavirus

Recent research on COVID-19 SARS-CoV-2

Viruses contagious to humans

Forecast of COVID-19 spread produced by Imperial College London

Diseases eradicated with vaccines (in Finnish)

Developing pharmaceuticals through computing (in Finnish)

Developing new prodrugs for COVID-19 protein with computational methods

European bioinformatics infrastructure ELIXIR

Image: Adobe Stock
Table: Picture modified from publicly shared Imperial College COVID-19 Response Team article from https://www.imperial.ac.uk/

 
“Tommi

Tommi Nyrönen

Dr. Tommi Nyrönen leads a team of experts in the European Life Science Infrastructure for Biological information ELIXIR at CSC.

tommi.nyronen(at)csc.fi

 
Twitter: @nyronen
Linkedin: https://www.linkedin.com/in/nyronen
puh. +358503819511
Blogger: Tommi Nyrönen Blog Topic: Data HPC Science and research

The EU’s digital policy has been a hot topic lately. Last week the European Commission published three significant strategies concerning digitalization, artificial intelligence and data. At the same time, the member states, the European Parliament the Commission are fighting over the size and priorities of the EU’s budget for the next seven years. The importance of the topic is emphasized by the fact that digital solutions are crucial for the realization of the Green Deal, the Commission’s other top priority.
 
The guiding document for the EU’s digital policy is the communication named “Shaping Europe’s digital future”. The communication draws together all the major policies concerning digitalization that the Commission intends to introduce during its five-year term. As in Finland’s government program, many topics are promised to be specified in strategies and reports that will be published later.

There are many favorable objectives in the communication. For example, climate neutrality of data centers by 2030 is easy to support. What comes to CSC, there is no need to wait for ten years; our data center in Kajaani is already carbon neutral. Next year it will even become carbon negative as the excess heat generated by our world-class supercomputer LUMI will be fed into the district heating network.

In order to achieve the carbon neutrality goal, it is necessary to place big, EU-funded computing facilities to environments where computation can be done ecologically. This opens up a possibility for Finland to act as a forerunner. Other warmly welcomed openings are, for example, investing in Europe’s strategic digital capacities and better access to health data, which will advance its use in research.

European perspectives on AI and data will sharpen with time

The Commission opened a discussion about the EU’s perspective on artificial intelligence (AI) in the form of a white paper. AI has great potential to act for the common good areas such as the health sector and transportation as well as by optimizing energy consumption. However, the use of AI includes also risks, which is why adequate legislation needs to be in place. The first half of the white paper lists actions that the EU plans to take in order to develop Europe’s AI capacities. It is especially important to develop high-performance computing, to deploy the FAIR principles (findability, accessibility, interoperability and re-usability) for data and to develop skills.

What comes to legislation, the white paper proposes new rules only for high-risk applications of AI. This is a sensible approach as many of the existing laws already concern AI. In addition, AI applications are very different in nature, and thus it does not make sense to regulate them all in the same manner. For no-high-risk applications, the Commission proposes a creation of a voluntary labelling system. Whatever actions will be taken with the regulation of AI, it is of utmost importance that there is one common set of rules for AI in the EU. It is the only way we can realize a true single market for data and AI.

In the third new document, data strategy, the EU aims to advance the usage and movement of data between member states and organizations. This will be realized by developing data sharing infrastructures and principles, such as interoperability and machine readability. A good starting point for interoperability is the European Interoperability Framework: data must be interoperable in technical, organizational, legal and semantic level. A completely new proposal is to create data spaces for certain strategic sectors; together these data spaces will form a single European data space. In the research world, this kind of data space has already been developed in the form of the European Open Science Cloud, which should be a good benchmark for others to start from. In order to succeed, it is vital that data really is interoperable, and that it moves between the sectors (research, public administration, business).

These three strategies lay out the digital policy of the EU for the next five years. However, in addition to policies and strategies, digitalization needs funding. Currently, the EU institutions and member states are negotiating a multiannual financial framework (MFF), i.e. the 7-year budget of the EU. MFF will determine how much EU money will be spent on research, digitalisation and competence building during the next seven years. Science and research get much attention in the politicians’ speeches, but in the EU’s budget, they seem once again to be superseded by old priorities. However, the negotiations are still far from done, so there remains hope that the Commission’s new policies will get some financial support to back them up.

 

Blogger: Ville Virtanen Blog Topic: Science and research Data HPC CSC: Blog Year: 2020

EU:ssa eletään digipolitiikan kannalta mielenkiintoisia aikoja. Viime viikolla komissio julkaisi kolme merkittävää strategia-asiakirjaa liittyen digitalisaatioon, tekoälyyn ja dataan. Samaan aikaan jäsenmaat, parlamentti ja komissio taistelevat seuraavien seitsemän vuoden budjetin koosta ja painotuksista. Aiheen tärkeyttä korostaa se, että digitaaliset ratkaisut ovat elintärkeässä asemassa EU:n toisen kärkihankkeen, Green Dealin eli vihreän kehityksen ohjelman, onnistumisessa.

EU:n digipolitiikkaa ohjaa tiedonanto liittyen Euroopan digitaaliseen tulevaisuuteen, joka on myös yksi komission kuudesta painopistealueesta. Tiedonannossa linjataan komission keskeisiä digipolitiikan tavoitteita ja hieman keinojakin näiden saavuttamiseen. Kotimaisen hallitusohjelmamme tapaan monista aiheista luvataan lisää selvityksiä ja linjauksia myöhemmin.

Tiedonannon konkreettisissa tavoitteissa oli paljon hyvää. Esimerkiksi tavoite datakeskusten hiilineutraaliudesta vuoteen 2030 mennessä on helppo allekirjoittaa. CSC:n puolesta ei tarvittaisi edes tätä kymmentä vuotta; Kajaanin datakeskuksemme on jo nyt hiilineutraali, ja ensi vuonna siitä tulee hiilinegatiivinen, kun uuden LUMI-supertietokoneen ylijäämälämpö syötetään kaukolämpöverkkoon.

Hiilineutraalisuustavoitteen saavuttamiseksi on tärkeää sijoittaa suuret EU-varoilla rakennettavat laskentaympäristöt sellaisiin paikkoihin, joissa laskentaa voidaan tehdä ympäristöystävällisesti. Tämä avaa Suomelle mahdollisuuden toimia edelläkävijänä. Muita tervetulleita linjauksia ovat muun muassa terveysdatan parempi saatavuus tutkimuskäyttöön sekä Euroopan digitaalisten kapasiteettien kehittäminen.

Euroopan näkökulmat tekoälyyn ja dataan tarkentuvat ajan kanssa

Komissio avasi myös keskustelun Euroopan näkökulmasta tekoälyyn valkoisen kirjan muodossa. Tekoälyllä on suuri potentiaali tuottaa yhteistä hyvää esimerkiksi terveydenhuollossa, liikenteessä ja energiankäytön optimoimisessa, mutta samalla sen käyttäminen sisältää myös riskejä. Valkoisen kirjan alkupuoli koostuu toimenpiteistä, joita EU aikoo toteuttaa Euroopan tekoälyvalmiuden kehittämiseksi. Erityisesti esille täytyy nostaa suurteholaskennan, datan FAIR-periaatteiden (löydettävyys, saatavuus, yhteentoimivuus ja uudelleenkäytettävyys) ja osaamisen kehittämisen tärkeys.

Lainsäädännön osalta asiakirjassa ehdotetaan uutta sääntelyä vain korkean riskin tekoälyn sovelluksiin. Tämä kuulostaa järkevältä, sillä monet jo olemassa olevat säädökset koskevat myös tekoälyä. Lisäksi on otettava huomioon, että tekoälyn sovelluksia on hyvin erilaisia, eikä olisi tarkoituksenmukaista säännellä niitä kaikkia samalla tavalla. Muille kuin korkean riskin sovelluksille ehdotetaan vapaaehtoista sertifikointijärjestelmää, mikä on myös tervetullut ajatus. Sisämarkkinoiden toimivuuden kannalta on pääasia, että tekoälyn sovelluksille on yhdet yhteiset säännöt ja sertifikaatit.

Datastrategiassaan EU pyrkii edistämään datan liikkuvuutta maiden ja organisaatioiden välillä kehittämällä datan jakamisen infrastruktuureja ja periaatteita, kuten yhteentoimivuutta ja koneluettavuutta. Yhteentoimivuuden osalta hyvänä lähtökohtana toimivat eurooppalaiset yhteentoimivuusperiaatteet: datan pitää olla yhteentoimivaa niin teknisellä, organisatorisella, semanttisella kuin lainsäädännölliselläkin tasolla. Uutena konkreettisena ehdotuksena on perustaa strategisille sektoreille omat data-avaruutensa, joista yhdessä muodostuu yksi suuri eurooppalainen data-avaruus. Tutkimuspuolella tällaista datapalvelua on jo edistetty Euroopan avoimen tieteen pilvipalvelun muodossa, joten aivan alusta ei tätä harjoitusta tarvitse aloittaa. Onnistumisen kannalta olennaista on jo mainittu yhteentoimivuus, sekä se, että data todella saadaan liikkumaan eri sektoreiden (tutkimus, julkishallinto, elinkeinoelämä) välillä.

Nämä kolme strategiaa linjaavat hyvin pitkälti sen, mitä EU seuraavan viiden vuoden aikana digipolitiikan saralla tulee tekemään. Oman lisämausteensa soppaan tuovat parhaillaan käytävät neuvottelut EU:n monivuotisesta rahoituskehyksestä. Rahoituskehys määrittää, kuinka paljon EU:n varoja tullaan seuraavien seitsemän vuoden aikana käyttämään tutkimukseen, digitalisaation edistämiseen ja osaamisen kehittämiseen. Digitalisaatio, tiede ja tutkimus saavat paljon huomiota juhlapuheissa, mutta budjetissa ne näyttävät valitettavasti tälläkin kertaa jäävän vanhojen painopisteiden jalkoihin. Neuvottelut ovat kuitenkin pahasti kesken, joten vielä on toivoa, että komission uudet linjaukset saavat taakseen myös rahallista tukea.

 

Blogger: Ville Virtanen Blog Topic: Science and research Data HPC CSC: Blog Themes: Kestävä tulevaisuus Viewpoints: Supertietokoneet Datan arvon maksimointi Vipuvoimaa tekoälystä

Starting January 1st, 2020, Schrödinger’s Maestro GUI, Small Molecule Drug Discovery, and Materials Science software applications are available for free for all academic users in Finland. At the same time, we have scaled down licenses for Biovia's Discovery Studio and Materials Studio. Please, check out also the rest of the application portfolio at docs.csc.fi/apps

Maestro is available on Puhti, and also for local installation

The Maestro GUI can be downloaded and installed on your local computer and heavier calculations or complete modeling workflows can be run on CSC’s Puhti supercomputer.

Schrödinger’s software platform integrates first-in-class solutions for predictive modeling, data analytics, and collaboration to enable rapid exploration of chemical space. An integrated graphical user interface allows users to design and run multi-stage workflows to be run on supercomputers. Schrödinger software has already been used in Finland in several research groups and companies for years, but especially new users will benefit from a wide selection of training materials.

Hands-on workshops to facilitate newcomers and power users

In November 2019 and January 2020 CSC organized training events with Schrödinger specialists Dr. Stefan Ehrlich, Dr. Simon Elliott and Dr. Laura Scarbath-Evers to bring new users up to speed on the best ways to apply the software.

The November workshop targeted the Drug Discovery and Small Molecule researchers and it started with a beginner’s day followed by another with advanced topics. The January workshop covered basic usage for Materials Science and featured case examples related to atomic layer deposition (ALD), which is a particular speciality of Finnish materials R&D. Both workshops were streamed online for remote participants.

Simon Elliott presenting how high throughput computational screening can help in the search for better ALD precursor molecules.

According to Simon Elliott, "Schrödinger’s Materials Science platform has specialized structure builders for everything from soft polymers to hard surfaces, smoothly interfacing with efficient molecular mechanics or quantum mechanics computations."

The workshop included a quick introduction to quantum mechanics and reciprocal space, before exploring the workflows in the Quantum Espresso graphical interface.

Dr. Laura Scarbath-Evers actually has prior experience on CSC resources. She was visiting Prof. Patrick Rinke from Aalto University in 2018 via the HPC-Europa3 research mobility programme.

“The research mobility stay with HPC-Europa3 was an amazing experience and I would recommend it to any other researcher who is already doing modelling or plans to get involved with it. It enhances the exchange within the scientific community and helps to build international collaborations. Additionally, researchers who come to the CSC - IT Center for Science for their HPC-Europa3 stay can now use the Schrödinger Materials suite which is another huge benefit.”

How to get access to the Schrödinger software?

The complete CSC software selection can be browsed online on our new user guide: docs.csc.fi/apps Select Maestro from the list and follow the instructions. You will need to create an account at Schrödinger to download the installation file (for Windows, Linux or Mac). Please select the non-academic, full functionality version. Note, that there are four updates every year and that installation requires admin privileges on your computer. Also, note the instructions on how to configure your installation to use the national license.

Master Maestro quickly

Schrödinger has lots of good material online either for self-study or to be used in training; please consult our Maestro page for recommendations.

Schrödinger also arranges intro online sessions. In February, they will have online Q&A sessions (2/week) so that you can log in at any time and ask anything you're interested in.

If you would like to have training in some particular topic, let us or Schrödinger know and we’ll organize an event given enough interest. In any case, hands-on training events are planned at CSC in autumn 2020, which will also be streamed online for remote participants.

Limited access for Discovery Studio and Material Studio in 2020

For year 2020 CSC has obtained a limited license for Biovia's Discovery Studio and Material Studio. The same functionality will be available as before, but the maximum number of simultaneous users will be limited. Thus, it may occur that at some occasions Discovery Studio or Material Studio can’t be used because all the licenses are in use. Therefore, if you're not actively using it, please close the GUI. In the longer run, please consider migrating to use Maestro, or some other software available through CSC.
 

 

Blogger: Atte Sillanpää Blog Topic: Software Science and research CSC: Blog Themes: Laskennallinen tiede Viewpoints: Supertietokoneet

You may have seen the news about opening of  Puhti and Allas. How did we get to this point, and how does it fit in with the overall roadmap?

Scientists have made use of the CSC supercomputers going back some 30 years. Since 70s’, CSC and its predecessor have hosted Finland’s fastests computers, starting with Univac 1108 in 1971, Vax 8600 in 1985 and finally the first supercomputer, the Cray X-MP, which was taken into use in the autumn of 1989.

Taito and Sisu were originally installed in 2012, and their computing power was improved with a major update in 2014. It has been 5 years since then, which is nearing the standard retirement age for a supercomputer. Due to continuous improvements in the efficiency of processors and other components, the same computing power can be achieved with significantly smaller hardware, which also consumes just a fraction of the power. On the other hand, significantly more computing power and storage space can be achieved using the same amount of power.

In 2015, CSC began to prepare its next update. The first task was to determine what needs and visions scientists had. What kinds of resources and how much of them will be needed in the future? We engaged in dialogue, conducted user surveys, held workshops in just about every Finnish university and interviewed top scientists. The report showed that there was a need for new infrastructure, with data and its use playing a particularly important role.

Together with research and innovation actors, the Ministry of Education and Culture launched the Data and Computing 2021 development programme (DL2021). In the development programme, EUR 33 million in funding was granted to the procurement of a new computing and data management environment, in addition to which the Finnish Government granted EUR 4 million from the supplementary budget for the development of artificial intelligence.

Supercomputer Puhti (2019). Photo: Mikael Kanerva, CSC.

The new hardware will serve six primary purposes:

1) Large-scale simulations: This group represents traditional high performance computing (HPC). These are utilized in physics and in various related fields. Challenging scientific questions are studied by massive computing, for example by high-precision simulations of nuclear fusion, climate change, and space weather.

2) Medium-scale simulations: This category covers a large part of the usage of the computing resources provided by CSC. These simulations include a wide range of disciplines, ranging from topics like biophysical studies of cell functions to material science and computational fluid dynamics. For this type of simulations, it is particularly important to enable workflows that allow a large number of simulations and provide efficient means to handle the resulting data.  The created data requires efficient analysis methods utilizing data-intensive computing and artificial intelligence.

3) Data-intensive computing: This use case covers analysis and computing with big data based on extensive source material. The largest group of data-intensive computing users at CSC are currently the bioinformaticians. Other large user groups include language researchers and researchers of other digital humanities and social sciences.

4) Data-intensive computing using sensitive data: Research material often contains sensitive information that cannot be disclosed outside the research group and is governed by a number of regulations, including the Personal Data Act and, from May 2018 on, the EU General Data Protection Regulation. In addition to the needs of data-intensive research in general, managing sensitive data requires e.g. environments with elevated data security and tools for handling authentication and authorization. Some examples include biomedicine dealing with medical reports and humanities and social sciences utilizing information acquired from informants and registries.

5) Artificial intelligence: Machine learning methods are applied to many kinds of scientific challenges, and their use is rapidly expanding to various scientific disciplines, including life sciences, humanities and social sciences. Machine learning is typically applied to analysis and categorization of scientific data. Easy access to large datasets, like microscope images and data repositories, is crucial for the efficiency of the artificial intelligence workload.

6) Data streams: Many important scientific datasets consist of constantly updated data streams. Typical sources for these kinds of data streams include satellites with measuring instruments, weather radars, networks of sensors, stock exchange prices, and social media messages. Additionally, there are data streams emanating from the infrastructure and between its integrated components

Supercomputers Puhti and Mahti

Two independent systems will provide computing power for CSC in the future: Puhti and Mahti.  

Puhti is a supercomputer, which is intended to support many of the above-mentioned purposes. It offers 664 nodes for medium-sized simulations with plenty of memory (192 GB or 384 GB) and 40 cores, which represent the latest generation of Intel Xeon processor architecture. These nodes are combined with an efficient Infiniband HDR interconnect network, which allows for the simultaneous use of multiple nodes. Some quantum chemistry applications benefit a great deal from fast local drives, which are found in 40 nodes.  The same nodes can be used for data-intensive applications, in addition to which the supercomputer has 18 large-memory nodes that contain up to 1.5 TB of memory.

One of the hottest topics right now is artificial intelligence. In science, its use is constantly increasing in both data processing and as part of simulations. With regard to this, Puhti has accelerated partition, Puhti-AI, which contains 80 GPU nodes, each of which has four Nvidia Volta V100 GPUs. These nodes are very tightly interconnected, thus allowing simulations and artificial intelligence work using multiple nodes to get as much out of the GPUs as possible. Majority of current machine learning workloads use only one GPU, but the trend is toward larger learning tasks. The new hardware makes it possible to use multiple nodes at the same time.  The new Intel processors (Cascade Lake) also include new Vector Neural Network Instructions (VNNI), which accelerate inference workloads by as much as a factor of 10. The supercomputer work disc is 4.8 PB.

In the procurement of Puhti,  CSC and the Finnish Meteorological Institute (FMI) collaborated to extend Puhti with a dedicated research cluster for the FMI. This 240 node partition is fully funded by the FMI and is logically separated from the main Puhti system while the hardware is fully integrated. In total this means that in the joint machine has 1002 nodes.

Mahti is being installed in the Kajaani Datacenter in the same room where Sisu was. Unlike Puhti, Mahti is fully liquid cooled. In terms of datacenter technology, the new supercomputer is a major improvement over Sisu. Mahti's liquid cooling system uses warm water (just under 40 degrees) as opposed to Sisu, which required cooled water. As a result, Mahti can be cooled more affordably and efficiently. Mahti is a purebred supercomputer containing almost 180 000 CPU cores in 1404 nodes. Each node has two next-generation AMD 64 core processors (EPYC 7H12)running at 2.6 GHz, making the theoretical peak power of the whole system 7.5 Pflops. This version of the AMD EPYC processor is the fastest CPU currently available, and will give Finnish science a unique competitive advantage. There is 256 GB of memory per node, so even large scale simulations requiring a large amount of memory can be run effectively. The supercomputer work disc is over 8 PB.

 

Puhti

  • 682 nodes, with two 20-core Intel Xeon Gold 6230 processors, running at 2.1 GHz
  • Theoretical computing power 1.8 Pflops
  • 192 GB - 1.5 TB memory per node
  • High-speed 100 Gbps Infiniband HDR interconnect network between nodes
  • 4.8 PB Lustre parallel storage system

Puhti-AI

  • 80 nodes, each with two Intel Xeon Gold 6230 processors and four Nvidia Volta V100 GPUs
  • Theoretical computing power 2.7 Pflops
  • 3.2 TB of fast local storage in the nodes
  • High-speed 200 Gbps Infiniband HDR interconnect network between nodes

Mahti

  • 1404 nodes with two 64 core AMD EPYC processors (Rome) running at 2.6 GHz
  • Theoretical computing power 7.5 Pflops
  • 256 GB of memory per node
  • High-speed 200 Gbps Infiniband HDR interconnect network between nodes
  • 8.7 PB Lustre parallel storage system


Allas data management solution

Growth in the volume of data and the need for different approaches to sharing it also pose new challenges for data management. A file system based on a conventional directory hierarchy does not fully meet future needs where, for example, the scalability of storage systems and the sharing and re-use of data are concerned.

Allas is CSC's new data management solution, which is based on object storage technology. The 12 PB system offers new possibilities for data management, analysis and sharing. Data is stored in the system as objects, which for most users are just files. As opposed to a conventional file system, files can be referred to other ways than by their name and location in the directory hierarchy, as the system assigns a unique identifier to each object. In addition to this, an arbitrary metadata can be added to each object, thus allowing for a more multifaceted description of the data.

Data stored in Allas is available on CSC's supercomputers and cloud platforms as well as from any location over the Internet. In the simplest case user can add and retrieve data on their own computer just through a web browser. Allas also facilitates the sharing of data, as the user is able to share the data they choose with either individual users or even with the whole world. Allas also offers a programming interface, which can be used to build a wide variety of services on top of it.

One example of the new use cases is data (possibly even very high volume) generated by an instrument, which can be streamed directly to Allas. The data can then be analyzed using CSC supercomputers, and the results can be saved back to Allas, from which it is easy to share the results with partners.

Data management system Allas (2019). Photo: Mikael Kanerva, CSC

A broad spectrum of scientific problems in pilot projects

During the Puhti supercomputer acceptance phase, a limited number of Grand Challenge research projects were given an opportunity to use the extremely large computing resources. An effort was made to take the various computing needs behind the supercomputer procurement into account when selecting pilot projects. The selected projects varied from conventional, large-scale simulations to research conducted using artificial intelligence, and the researchers studied a wide range of topics from astrophysics to personalized medicine. The rise of AI as a part of the workflow was a big trend, and 61% of all resources were used by projects which had, or planned to have, AI as a part of their work.

Pilot period was very successful in testing the system. The projects were able to generate very high load on the system and thus confirm that the system was usable with real workload. Several projects were also able to make significant progress in their research during the piloting period. Due to testing nature of the acceptance phase some projects, however, faced technical problems but also these experiences were very important to CSC since it helps CSC to improve the functionality of the system. In successful projects the performance of Puhti was generally a bit better when compared to Sisu, both in terms of parallel scalability and in terms of single core performance.

A new group of Grand Challenge pilot projects will be selected at the end of 2019 for the acceptance phase of the Mahti supercomputer. We look forward to see what kinds of scientific challenges await!

In conclusion

The Puhti supercomputer has been opened to customers in 2.9.2019 and Allas data management solution in 2.10.2019. Researchers working in Finnish universities and research institutes may apply for access rights and computing resources on the CSC Customer Portal at https://my.csc.fi.

Software offering in Puhti is currently more limited than in Taito, but new software is being installed almost on a daily basis. Also the user documentation is continuously extended. CSC will also be organizing several training sessions on the use of the environment for both new and experienced users in 2019 - 2020, the first Puhti porting and optimisation workshop has already been held.

Further links:

CSC Customer portal: my.csc.fi
Information about CSC computing services: research.csc.fi
Information about the new infrastructure: research.csc.fi/dl2021-utilization
User documentation: docs.csc.fi

CSC supercomputers and superclusters

1989 Cray X-MP
1995 Cray C94
1997 Cray T3E
1998 SGI Origin 2000
2000 IBM SP Power3
2002 IBM p690 Power 4
2007 Cray XT4 (Louhi)
2007 HP Proliant CP400 (Murska)
2012 Cray XC40 (Sisu)
2013 HP Apollo 6000 XL230a/SL230s Supercluster (Taito)
2019 Atos BullSequana X400 (Puhti)
(2020 Atos BullSequana XH2000 (Mahti))

Authors
Sebastian von Alfthan and Jussi Enkovaara are high performance computing experts at CSC.

P.S.

You might have heard news about LUMI, the European pre-exascale computer that will be hosted by CSC. LUMI will be huge addition to computational resources available to Finnish researchers from 2021 on, but we will come back to the story of LUMI later on.

 

Blogger: Jussi Enkovaara Sebastian von Alfthan Blog Topic: HPC CSC: Blog News Categories: Research Themes: Theme Front Page Viewpoints: Supertietokoneet

Tekoäly on puheenaihe, joka ei tällä hetkellä esittelyä kaipaa. Aihe on kuitenkin IT-ammattilaisten maailmassa siinä mielessä poikkeuksellinen, että se on kiinnostava, lähestyttävä ja jopa kutkuttava myös suuren yleisön näkökulmasta. Ja niinpä tekoälystä kirjoitetaan nykyään paljon, kaikkialla ja kaikenlaista. Tekoälyn määrittely on vaikeaa alan tutkijoillekin, joten ymmärrettävästi moni tavallinen keskustelija ei tunnu ihan kauhean hyvin tietävän mistä puhuu, kun puhuu tekoälystä.

Haastattelin aiemmin tekoälyn uranuurtajaa, professori Timo Honkelaa hallitusammattilaisten yhdistyksen blogissa julkaistua kirjoitusta varten. Keskustelusta jäi mieleeni erityisesti ajatus, johon huomaan palaavani uudestaan ja uudestaan. Honkela vertasi tekoälyä käsitteenä supertietokoneisiin: molemmat ovat termejä, jotka kehityksen myötä pakenevat omaa määrittelyään.

Supertietokoneilla tarkoitetaan aina kunkin ajan kaikista tehokkaimpia tietokoneita – voidaan ajatella, että laskentakeskusten ylläpitämä Top500-lista maailman tehokkaimmista tietokoneista on samalla sanan supertietokone ajantasainen määritelmä. Et ole super, jos et ole listalla. Yksittäinen tietokonejärjestelmä ei pysy listattuna juurikaan viittä vuotta kauempaa, joten tässä mielessä supertietokoneen määritelmä uusiutuu noin viiden vuoden välein.

Kun termi tekoäly1  puretaan osiinsa, voidaan havaita, kuinka puhumme jostain ihmisen älykkyyden kaltaisesta asiasta, joka ei kuitenkaan ole ihmisälyä. Usein voidaan myös hyvin puhua pelkästä älykkyydestä: asiayhteydestä käy hyvin ilmi, että esimerkiksi älykoti ei omaa inhimillistä älykkyyttä, vaan sen sijaan on rakennettu erilaisten tekoälyjärjestelmien avulla.

Trendi ajan saatossa on ollut, että vaatimustaso ihmisen älykkyyden kaltaisuuden saavuttamiseen on noussut kussakin sovellusalueessa sitä mukaa, kun tekninen kehitys on edennyt. Esimerkiksi vielä vähän aikaa sitten automekaanikot usein puhuivat älylaatikoista, kun kyse oli melkein mistä tahansa auton sähköisestä ohjainmoduulista. Nämä laatikot olivat korvanneet aiemmat yksinkertaisemmat mekaaniset laitteet ja tuoneet auton toiminnan ohjaukseen hitusen monimutkaisempaa logiikkaa, jota siis älykkyydeksi kutsuttiin. Nyt kun autojen kehitys on saavuttamassa pisteen, jossa itsekseen ajavat robottiautot tulevat liikenteeseen, niin ajovalot automaattisesti päälle napsauttavaa ohjainpiiriä tuskin enää voidaan pitää esimerkkinä autoteollisuuden älykkäiden järjestelmien kehitysponnisteluista.

Tekoälytutkimus on kulkenut tietokoneiden kehityksen rinnalla aivan alusta alkaen, mutta tasaisen voittokulun sijaan tekoälyn historia on ollut varsinaista vuoristorataa nousuineen ja jyrkkine laskuineen. Omana erityisalueenaan tietokonepelien tekoäly on kuitenkin kehittynyt tasaisemmin läpi vuosikymmenten.

Tietokonepeleistä voidaan lukea tuttu kehityskulku: pari yksinkertaista loogista sääntöä riittivät liikuttamaan niitä kummituksia, jotka 1980-luvulla jahtasivat Pacman-otusta labyrintissa, kun taas nykyaikaisten pelien monimutkaisissa virtuaalisissa maastoissa tarvitaan kehittyneitä algoritmeja etsimään parhaita reittejä ja ohjaamaan hahmot esteiden ohi.

Vuonna 2016 AlphaGo-niminen tekoäly onnistui päihittämään maailman huippua edustaneen pelaajan eräänlaisena ihmisen peliälyn viimeisenä linnakkeena pidetyssä Go-pelissä. Tätä saavutusta varten tutkijat olivat yhdistelleet useita kehittyneitä koneoppimisen tekniikoita. Matka Pacmanista AlphaGo:hon on pitkä ja niissä käytetyt menetelmät ovat vaativuudessaan aivan eri tasoilla, mutta päämäärä on sama: luoda illuusio älykkäästä vastapuolesta.

Tietotekniikassa kaikki on aina ollut monimutkaisen logiikan ohjaamaa ja siksi lähtötasokin älykkyydelle on verrattain korkea. Reitinhakualgoritmit ja niihin perustuvat navigaattorit ovat meille jo arkipäivää, mutta toki aikoinaan varmasti tekivät moniin vaikutuksen. Tänä päivänä emme kuitenkaan usein enää osaa pitää tavallista navigaattoria älykkäänä, vaan odotamme, että älykkääksi kutsuttavan navigaattorin tulisi vähintään ymmärtää puhetta ja arvata puolesta sanasta, että mihin haluamme mennä.

Algoritmit ja automaatio ovat arkipäiväistyneet. Tietotekniikan suuret algoritmit ovat ehkä jo kirjoitettu. Tietotekniikan ja digitalisaation kehityksen jatkumiseksi tietokoneen älykkyys ei voi enää kulkea ohjelmoijan sormien kautta, vaan järjestelmien tulee kyetä oppimaan itse. Näin ollen tämän päivän tekoäly perustuu koneoppimiseen eli tietokonejärjestelmiin, jotka eivät tarvitse valmiita sääntöjä, vaan voivat oppia ne datasta.

Nykyisen tekoälyvallankumouksen käynnisti koneoppimisen sisällä tapahtunut edistysaskel, nimittäin niin kutsutun syväoppimisen menetelmien kehittyminen. Koneoppimisen perinteinen rajoite on ollut datan määrä. Jos kirjastossasi on vain yksi kirja, niin yleissivistyksesi ei kasva, vaikka luet sitä samaa kirjaa päivittäin. Opit kyllä varmasti ulkoa kaikki ladontavirheet, jokaisen aliluvun sivunumerot ja kahvitahrojen paikat.

Suurten tietomassojen saatavuus sekä oppimistehtäviin hyvin soveltuvien uudenlaisten GPU-laskentakiihdyttimien kehittyminen mahdollistivat sen, että tekoälyn eteen on voitu latoa hyllykilometreittäin mielekästä opeteltavaa. Nämä koneoppimisen ympäristön muutokset puhalsivat uutta eloa jo pidemmän aikaa sitten kehitettyihin neuroverkkomenetelmiin ja tarjosivat sen työkalupakin, jota tarvitaan seuraavien kehitysaskelten ottamiseen tekoälyn rintamalla.

Onkin siis hyvä muistaa, että tekoäly ei ole itsessään menetelmä tai teknologia. Paras määritelmä mielestäni on, että tekoäly tarkoittaa älykkäinä pidettävien toimintojen toteuttamista tietokoneella. Ja kuten edellä kävi ilmi, niin tuo älykkyyden rima nousee jatkuvasti eri sovelluskohteissa.

Voidaan ajatella, että tekoäly on kuin ilmansuunta. Talvipakkasia voidaan lähteä karkuun etelään ja sinne matkustaminen onnistuu monelle eri välineellä: esimerkiksi laivalla ja bussilla voi matkustaa Viron kylpylöihin, kun taas lentokoneella pääsee Välimeren kohteisiin. Toisaalta jos Kreikan saaristoon iskee ennätyskylmä talvi, niin ei auta, että ollaan jo Suomesta katsoen hyvin pitkällä etelässä. Matkaa täytyy jatkaa Pohjois-Afrikkaan lämpimämpien säiden toivossa.

Myös tekoälykompassin osoittamaan suuntaan olemme matkanneet jo monin eri välinein, aina perusalgoritmiikasta erilaisiin koneoppimisen ja tiedonlouhinnan menetelmiin. Voidaan kuitenkin varmuudella sanoa, että kovin pitkällä emme tuolla matkalla ole ja tulemme vielä tarvitsemaan monia uusia matkustusvälineitä.

Kun etelään matkaamista jatkaa riittävän kauan, niin lopulta pääsee perille. Ensimmäisenä sinne saapui Roald Amundsenin retkikunta ja nyt etelänavasta on tullut usean suomalaisenkin hiihtovaelluksen kohde. Tekoälyn osalta se suuri filosofinen kysymys on, että onko tekoäly pelkästään suunta, vai myös lopulta kohde? Kun olemme tehneet matkaa riittävän kauan, niin tulemmeko lopulta perille? Ja mikä meitä siellä odottaa?

1 Tekoälyn sijaan oikeampi termi olisi keinoäly. Laitteista puhuttaessa teko-alku viittaa laitteeseen, joka jäljittelee esikuvaansa toiminnaltaan ja ulkonäöltään. Keinoälyn kaltaisia termejä puolestaan ovat esimerkiksi keinomunuainen ja keinohorisontti, jotka eivät vastaa esikuvansa ulkoista olemusta. Tekoäly on kuitenkin vakiintuneempi ja kirjoittaja mieluusti tyytyy siihen, että termin kieliasu on hieman epätarkka, kunhan käsitys varsinaisesta asiasta sanan takana tarkentuisi.

Blogger: Aleksi Kallio Blog Topic: Data HPC CSC: Blog Themes: Laskennallinen tiede Theme Front Page Viewpoints: Vipuvoimaa tekoälystä

If you follow CSC on social media you might have noticed a recent announcement about a new service based on OKD/Kubernetes called Rahti. This new service allows you to run your own software packaged in Docker containers on a shared computing platform. The most typical use case is web applications of all sorts. In this blog post I will provide additional context for the announcement and more detail and examples about what Rahti is and why it’s useful.

CSC has been running cloud computing services for a while. The first pilot systems were built in 2010 so the tenth anniversary of cloud computing at CSC is coming up next year. All of CSC’s previous offerings in this area – cPouta, ePouta and their predecessors – have been Infrastructure as a Service (IaaS) clouds. In this model, users can create their own virtual servers, virtual networks to connect those servers and virtual disks to store persistent data on the servers. This gives you a lot of flexibility as you get to choose your own operating system and what software to run on that operating system and how. The flip side is that after you get your virtual servers, you are on your own in terms of managing their configuration.

Rahti takes a different approach. Instead of a virtual machine, the central concept is an application. The platform itself provides many of the things that you would need to manage yourself in more flexible IaaS environments. For example:

  • Scaling up applications by adding replicas
  • Autorecovery in case of hardware failures
  • Rolling updates for a set of application replicas
  • Load balancing of traffic to multiple application replicas

Not having to manage these yourself means you can get your applications up and running faster and  don’t have to spend as much time maintaining them. What enables this is standardization of the application container and the application lifecycle. In IaaS clouds you have a lot of choice in terms of how you want to make your application fault tolerant and scalable. There are many software products available that you can install and configure yourself to achieve this. With Rahti and other Kubernetes platforms, there is one standard way. This simplifies things greatly while still providing enough flexibility for most use cases.

Based on the description above you might think that Rahti fits into the Platform as a Service (PaaS) service model. While there are many similarities, traditional PaaS platforms have typically been limited in terms of what programming languages, library versions and tools are supported. It says so right in the NIST Definition of Cloud Computing: “The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider.” These limitations are largely not there in Rahti or other Kubernetes platforms: if it runs in a Docker container, it most likely also runs (or can be made to run) in Rahti. You are free to choose your own programming language and related libraries and tooling yourself.

Setting up Spark in Rahti

One of the big benefits of Rahti is that complex distributed applications that would be difficult to install and configure on your own on virtual machines can be packaged into templates and made available for a large number of users. This means figuring out how to run the application has to be done only once – end users can simply take the template, make a few small customizations and quickly get their own instance running. You are of course also free to create your own templates and run your own software.

One example of a distributed application that can be difficult to install and manage is Apache Spark.   It is a cluster software meant for processing large datasets. While it is relatively simple to install it on a single machine, using it that way would defeat the point of running Spark in the first place: it is meant for tasks that are too big for a single machine to handle. Clustered installations on the other hand mean a lot of additional complications: you need to get the servers to communicate with each other, you need to make sure the configuration of the cluster workers is (and stays) somewhat identical and you need to have some way to scale the cluster up and down depending on the size of your problem – and the list goes on.

Let’s see how one can run Spark in Rahti. The template that we use in Rahti is available on GitHub and the credit for it goes to my colleagues Apurva Nandan and Juha Hulkkonen. And yes, I know that is actually the Hadoop logo.

First select “Apache Spark” from a catalog of applications:

You can also find other useful tools in the catalog such as databases and web servers. After selecting Apache Spark, you’ll get this dialog:

Click next and enter a few basic configuration options. There are many more that you can customize if you scroll down, but most can be left with their default values:

After filling in a name for the cluster, a username and a password, click “Create” and go to the overview page to see the cluster spinning up. After a short wait you’ll see a view like this:


The overview page shows different components of the Spark cluster: one master, four workers and a Jupyter Notebook for a frontend to the cluster. These run in so called “pods” that are a collection of one or more containers that share the same IP address. Each worker in the Spark cluster is its own pod and the pods are distributed by Rahti on separate servers.

From the overview page you can get information about the status of the cluster, monitor resource usage and add more workers if needed. You can also find a URL to the Jupyter Notebook web interface at the top and if you expand the master pod view you can find a URL to the Spark master web UI. These both use the username and password you specified when creating the cluster.

If you need a more powerful cluster you can scale it up by adding more workers. Expand the worker pod view and click the up arrow next to the number of pods a few times:

You can then follow the link from the overview page to Jupyter Notebook which acts as a frontend for the Spark cluster.

And that’s all there is to it! The process for launching other applications from templates is very similar to the Spark example above. The plan for the future is to add more of these templates to Rahti for various types of software in addition to the ones that are already there.

If you’re interested in learning more about Rahti, you can find info at the Rahti website or you can contact servicedesk@csc.fi.

Photo: Adobe Stock

 

 

Blogger: Risto Laurikainen Blog Topic: Science and research Data HPC CSC: Blog News Categories: Research Themes: Laskennallinen tiede

Recently, the CSC policy for free and open source software was posted without any celebration. It is under our Github organization and you can check it out at:

https://github.com/CSCfi/open-source-policy

Our tuned down approach stemmed from the fact that not much changed with the adoption of the policy. It pretty much stated the already established approach to endorsing open source software in our daily work.  The paths of CSC and open source have crossed from the very beginning, when we were in the happy position to offer the platform for distributing the very first version of the Linux operating system – and were of course early adopters of Linux in our operations.

CSC is a non-profit state enterprise embracing free and open source software throughout the operations and development. For us, open source software together with open data and open interfaces are the essential building blocks of sustainable digital ecosystems. CSC employees haven’t been shy of using and producing open source, but we still wanted to codify the current de facto practices and to encourage employees to go on supporting open source.

The major decision when formulating the policy was to put special emphasis on collaboration. We’ve been involved in dozens of open source projects and seen the realities of community building efforts. Community building is hard work.

The policy aims to encourage practices that in the best possible way encourage collaboration and contributing within the open source community. We find that the best way to do it is to embrace the licensing practices of the surrounding community. For some types of applications it might mean GPL licensing, where as increasingly the norm has been to use permissive licenses and to not enforce contributor agreements.

We have been happy contributors to projects such as OpenStack and felt extremely delighted to be also in the receiving side when working as main developers of software such as Elmer and Chipster. Every contribution counts and even the smallest ones usually carry some expertise or insight that broadens the scope of the project.

Finally, the policy aims to be concise and practical. It should offer guidance to everyday working life of CSC people who are part of the large open source community. So we did not want to make it a monolithic document written in legal language that would have been foreign to almost all of the developers in the community.

Happy coding!

P.S. If you would like to use the policy or parts of it for your organization or project, please do so! It is licensed under CC-BY 4.0, so there are no restrictions on reuse. Obviously, this is the licensing recommendation for documentation we give in the policy!

Photo: Adobe Stock

Blogger: Aleksi Kallio Blog Topic: Software Science and research HPC CSC: Blog News Categories: Research Themes: Laskennallinen tiede Viewpoints: Tutkimusverkot

Our trusted workhorse Sisu is ending its duty during this month after respectable almost seven years of operation.

Sisu started its service in the autumn of 2012 as a modest 245 Tflop/s system featuring 8-core Intel Sandy Bridge CPUs, reaching its full size in July 2014 with a processor upgrade to 12-core Intel Haswell CPUs and increasing the number of cabinets from 4 to 9. The final configuration totalled 1688 nodes and 1700 Tflop/s theoretical performance. At best, it was ranked the 37th fastest supercomputer in the world (Top500 November 2014 edition). It remained in among the 100 fastest systems in the world for three years, dropping to position #107 in the November 2017 list.

Throughout its service, Sisu proved itself as a very stable and performant system. The only major downtime took place when there was a major disaster that took down the shared Lustre filesystem.

During the course of years, Sisu provided over 1.7 billion core hours for Finnish researchers, playing a major role in several success stories in scientific computing in Finland. Just a couple of examples:

In addition to being a highly utilized and useful Tier-1 resource, it acted as a stepping stone for several projects that obtained the heavily competed PRACE Tier-0 access on the Piz Daint system in Switzerland and other largest European supercomputers. Without a credible national Tier-1 resource, establishing the skills and capacities for using Tier-0 resources would be hard if not impossible.

Sisu also spearheaded several technical solutions. It was among the first Cray XC supercomputers in the world with the new Aries interconnect. In the second phase it was equipped with Intel’s Haswell processors weeks before they had been officially released. It also heralded a change in hosting for CSC. Instead of the machine being placed in Espoo in conjunction with the CSC offices, it was located in an old papermill in Kajaani. This change has brought major environmental and cost benefits, and has been the foundation for hosting much larger machines.

Sisu was the fastest computer in Finland throughout its career, until last month when CSC’s new cluster system Puhti took over the title. Puhti will be complemented by the end of this year by Sisu’s direct successor Mahti, which will again hold the crown for some time. Puhti is currently under piloting use and becomes generally available during August, Mahti at the beginning of next year. Sisu has done its duty now and we wish it a happy retirement. Hats off!

 

Blogger: Jussi Heikonen Pekka Manninen Sebastian von Alfthan Blog Topic: Science and research HPC CSC: Blog News Categories: Research Themes: Laskennallinen tiede

Variant Calling

Modern next-generation sequencing technologies have revolutionized the research on genetic variants whose understanding hold a greater promise for therapeutic targets of human diseases. Many human diseases, such as cystic fibrosis, sickle cell disease and various kinds of cancers are known to be caused by genetic mutations. The identification of such mutations helps us diagnose diseases and discovery new drug targets. In addition, other relevent research includes topics such as human population separation history, species origin, animal and plant breading research.

Variant calling refers to the process of identifying variants from sequence data. There are mainly four kinds of variants: Single Nucleotide Polymorphism (SNP), short Insertion or deletion (Indel), Copy Number Variation (CNV) and Structural Variant (SV) (Figure 1).

Figure 1 The four most common types of variants.

Industry gold-standard for variant calling: GATK and Best Practices

To offer a high accurate and repeatable variant calling process, Broad Institute developed variant calling tools and its step-by-step protocol, named: Genome Analysis Toolkit (GATK) and Best Practices.

GATK is a multiplatform-capable toolset focusing on variant discovery and genotyping. It contains the GATK variant caller itself and it also bundles other genetic analysis tools like Picard. It comes with a well-established ecosystem that makes it able to perform multiple tasks related to variant calling, such as quality control, variation detection, variant filtering and annotation. GATK was originally designed and most suitable for germline short variant discovery (SNPs and Indels) in human genome data generated from Illumina sequencer. However, Broad Institute keeps developing its functions. Now, GATK also works for searching copy number variation and structure variation, both germline and somatic variants discovery and also genome data from other organisms and other sequencing technologies.

Figure 2 The GATK variant calling process.

GATK Best Practices is a set of reads-to-variants workflows used at the Broad Institute. At present, Best Practices contains 6 workflows: Data Pre-processing, Germline SNPs and Indels, Somatic SNVs and Indels, RNAseq SNPs and Indels, Germline CNVs and Somatic CNVs. (You can check the Best Practices introduction on forum and codes on github).

Although workflows are slightly different from one another, they all share mainly three steps: data pre-processing, variant discovery and additional steps such as variants filtering and annotation. (1) Data pre-processing is the starting step for all Best Practices workflows. It proceeds raw FASTQ or unmapped BAM files to analysis ready BAM files, which already aligned to reference genome, duplicates marked and sorted. (2) Variant discovery is the key step for variant calling. It proceeds analysis ready BAM files to variant calls in VCF format or other structured text-based formats. (3) Additional steps are not necessary for all workflows and they are tailored for the requirements of different downstream analysis of each workflow. Variants filtering and annotation are the two common choices.

GATK pipelining solution: WDL and Cromwell

It is great and time saving to have scripts to run analysis pipelines automatically. In the past, people used Perl or Scala to do this. However, it shows steep learning curve for non-IT people. Broad Institute solved this problem by introduced a new open source workflow description language, WDL. By using WDL script, you can easily define tasks and link them orderly to form your own workflow via simple syntax and human understandable logic. WDL is simple but powerful. It contains advanced features and control components for parallelism or running time and memory control. Also, WDL is a cross-platform language which can be ran both locally and on cloud.

Cromwell is the execution engine of WDL, which is written in Java and supports three types of platform: local machine, local cluster or computer farm accessed via a job scheduler or cloud. Its basic running environment is Java 8.

Write and run your own WDL script in 5 minutes with this quick start guide.

Run GATK4 on CSC Pouta Cloud and Taito

GATK3 was the most used version in the past. Now, GATK4 taking advantage of machine learning algorithm and Apache Spark tech presents faster speed, higher accuracy, parallelization and cloud infrastructure optimization.

The recommend way to perform GATK Best Practices is to combine GATK4, WDL script, Cromwell execution engine and Docker container. In CSC, Best Practices workflows are written in WDL, then run by Cromwell on Pouta cloud and relative tools such as GATK4, SAMtools and Python are called as Docker images to simplify software environment configuration.

CSC provides large amount of free computing/storage resources for academic use in Finland and facilitates efficient data transfer among its multiple computing platforms. cPouta and ePouta are the open shell IaaS clouds services at CSC. cPouta is the main production public cloud while ePouta is the private cloud which is suitable for sensitive data. They both own multiple virtual machine flavors, programmable API and Web UI, which enables users to generate and control their virtual machines online easily. They are suitable for various kinds of computational workloads, either HPC or genetic computing load.

In CSC, GATK4 Best Practices germline SNPs and Indels variants discovery workflow has been optimized and performance benchmarked on Pouta virtual machine (FASTQ, uBAM and GVCF files are acceptable input). Somatic SNVs and Indels variants discovery workflow is coming soon.

Besides using cloud infrastructure for GATK via launcing a virtual machine in Pouta with this tutorial, one can also use GATK in supercomputing cluster environment (e.g. on Taito with tutorial) by loading GATK module as below:

module load gatk-env

The detailed usage of instructions can be found in GATK user guide and the materials from the GATK course held in May 2019 at CSC can be found in “Variant analysis with GATK” course page.

You are welcome to test GATK tool in CSC environment and our CSC experts are glad to help you to optimize running parameters, set up virtual machine environment, estimate sample processing time and offer solutions for common error message.

Photo: Adobe Stock

Blogger: Shuang Luo Blog Topic: Data HPC CSC: Blog News Categories: Research

During past years, sensitive data has become one of the hottest of hot topics in the area of Finnish scientific data management discussion — and not least thanks to the European General Data Protection Regulation. At the same time, for nearly five years now, CSC has provided ePouta cloud platform for all sensitive data processing needs with quite substantial computing and storage capacity. From grounds up, this virtual private IaaS cloud solution has been designed to meet the national requirements for IT systems for protection level III (ST III) data.

While ePouta has been successful in providing our institutional customers a safe and robust platform for their sensitive data processing, it has lately become very clear that something more is desperately needed; something which is more easily adopted and accessed, something for individual researchers and research groups, and something more collaborative.

Now here, a problem arises; by definition sensitive data contains information which should only be processed either by explicit consent or a legitimate permission, and there are certain rules for such processing. Probably most notable ones of those rules — from researchers’ perspective — are requirements for data minimisation, pseudonymisation, encryption, safe processing and data disposal after its use.

Data minimisation and pseudonymisation relate directly to dataset definition. Minimisation means that only the data that is absolutely needed should be processed. For example, if the dataset includes information about persons' age but that information is not needed for the research, it should not be included in the dataset and should be removed from it before processing.

Pseudonymisation is a de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms.

Pseudonymisation differs from anonymisation in that pseudonymised data can be restored to its original state with the addition of information which then allows individuals to be re-identified again. Such re-identification codes must be kept separate from the pseudonymised data. Clearly then, these topics are something that the data owner or the researcher should take care of but for the rest, they seem to be more of a technical things and are something CSC should help with. And this is exactly where our sensitive data services step in.

You know the rules and so do I

The center piece for sensitive data services is storage. The data should be stored in such a way that unauthorised access is virtually impossible yet at the same time legitimate access is as easy as possible. Furthermore, the data should not disappear, corrupt, or leak out while being stored and used. Data owners should be able to easily store their sensitive data and be able to share it with only those users they grant permissions to.

CSC’s Sensitive Data Archive service is designed to fulfil all the requirements mentioned above and even some more. Instead of providing just regular storage space the new Sensitive Data Archive adds a service layer between the storage and the user applications. This service layer, called Data Access API, takes care of encryption and decryption of data on behalf of the user, which also offloads the encryption key management tasks from users.

Furthermore, the Data Access API ensures that the secured data is visible and accessible for only those users who have been granted to access it by the data owner. The processing environment, access mechanism and the sensitive data storage are all logically and physically separated from each other in order to ensure maximum security. This also makes the sensitive data platform flexible since compute and storage are not dependent on each other but the glue between them still makes it seamless and transparent for the user.

Take my hand, we’re off to secure data land

So, how does it work for the user then? Let’s first assume that the dataset a user is interested in has already been stored in the Sensitive Data Archive. The data is safely stored and it is findable by its public metadata but by no means it is accessible at this point — the user needs a permission for the dataset she needs for her research. Instead of traditional paper application sent to the dataset owner, she will apply through a web portal to a Resource Entitlement Management System, REMS, which will circulate the application with data owner(s). Once the application has been accepted a digital access token will be created, which is equivalent, e.g. to a passport and visa granting entry into a foreign country.

Now, when logging in to a sensitive data processing system, this digital access token will be transparently passed along with login information on the compute system. The Sensitive Data Archive’s Data Access API will query the token and, based on the information in it, will present the dataset in a read-only mount point on the local file system. Even though files seem just like your regular files on your file system they are actually a virtual presentation of the actual files. No single file has been copied into the compute system, yet they are accessible as any regular file. Once a file operation is acted upon a dataset file the Data Access API will fetch just the requested bits from the storage, decrypt them and hand out to the process requesting them — just like any other operating system call to any other file.

One added benefit directly derived from the usage of access tokens is the fact that they have a validity period — or they can be revoked by the data owner at any given time. Once the token expires the Data Access API will cut off the access to the files; they simply disappear from the compute system like a puff. Or the validity period can be easily extended, too. Thus, the data owner retains full control over the data she stored on the Sensitive Data Archive.

For data owner the procedure for storing the data is — if possible — even simpler. You just need to define metadata for your dataset and then enter it (either manually or automated through an API) into REMS and then upload your data. The upload tool will encrypt the data and send it to the archive, which will re-encrypt the data such that it truly is secure. Even you, as a data owner and submitter, are not able to read it back without granting yourself a permission first and using the Data Access API on our sensitive data compute systems.

Something old, something new, something browser’ed

So far so good, but the question has always been ePouta being too inflexible for individuals and smaller research groups, actually. Good news is that the Data Access API has been successfully demonstrated in ePouta and it will become a full-blown service later this year.

But even better news is that along with that there will be a whole new service for ePouta: a remote desktop connection for individual users.

Each user, or a group of users  if that’s the case, will get their very own private virtual cloud resource with Data Access API. And the best part of it is that it does not require any client software installations on users’ end. Just a reasonably modern web browser is enough, even a smartphone’s browser is sufficient (I have tested it, it works, even on 4G — but really, it is close to useless on such a small screen with touch input only).

Are we there yet?

While we haven’t really figured out yet how the project model goes, or how users can install the software they need — it is ePouta without external connections — and some other pretty important stuff for service processes, the technology is already there and becoming mature and robust enough that we’re confident in saying that ePouta Remote Desktop is a publicly available service later this year.

The end credits (which no one reads)

Early on with much planning put into our sensitive data model we realised that it is vital that we do not just develop a new fancy platform and then try to make everyone use it. Instead, we tried to team up and collaborate with partners with similar ambitions and focused on making as flexible a service as possible and use open standards as much as possible.

Developed in a joint effort with Nordic e-Infrastucture Collaboration’s (NeIC) Tryggve project and Centre for Genomic Regulation (CRG), the Data Access API is part of the Federated EGA concept designed to provide a general, distributed and secure storage for genomic data along the European Genome-Phenome Archive (EGA). But while genomic data has been the driving factor the API is actually data type agnostic and works for any data type, e.g. text, binary, video, etc.

In our future dreams anyone could install the Sensitive Data Archive and host their sensitive data by themselves but still make it available for access in ePouta Remote Desktop — something we’ve already tested with our Swedish partners, accessing two separate datasets stored in Finland and Sweden, used in ePouta Remote Desktop with a mobile phone at Oslo Airport…

Image: Adobe Stock

Blogger: Jaakko Leinonen Blog Topic: Data CSC: Blog News Categories: Research Themes: Theme Front Page Viewpoints: Datan arvon maksimointi Datan säilytys ja turvaaminen

March has been the month for the Spring School in Computational Chemistry for last 8 years. This time the school was overbooked already in November so if you want to join next year, register early.

Correspondingly, we decided to accept more participants than before resulting in tight seating and parallel sessions also for the last day hands-ons of the School. 31 researchers from Europe and beyond spent four science-packed days in occasionally sunny Finland.

Three paradigms in three days

The foundations of the school - the introductory lectures and hands-on exercises of (classical) molecular dynamics and electronic structure theory - have been consistently liked and found useful and have formed the core with small improvements.

For the last four years we've integrated the latest research paradigm, i.e. data driven science, also known as, machine learning (ML) to the mix. This approach has been welcomed by the participants, in particular as the lectures and hands-on exercises given by Dr. Filippo Federici Canova from Aalto University have been tailored for computational chemistry and cover multiple approaches to model data. ML is becoming increasingly relevant, as one of the participants, Mikael Jumppanen, noted in his flash talk quoting another presentation from last year: "Machine learning will not replace chemists, but chemists who don't understand machine learning will be replaced."

The ML day culminated in the sauna lecture given by prof. Patrick Rinke from Aalto University. He pitted humans against different artificial intelligence "personalities". The competition was fierce, but us humans prevailed with a small margin - partly because we were better at haggling for scoring.

Food for the machines

This year we complemented the ML session with means to help create data to feed the algorithms. Accurate models require a lot of data, and managing hundreds or thousands of calculations quickly becomes tedious.

Marc Jäger from Aalto University introduced the relevant concepts, pros and cons of using workflows, spiced with the familiar hello world example. It was executed with FireWorks, a workflow manager popular in materials science. Once everyone had succeeded in helloing the world, Marc summarized that "this was probably the most difficult way of getting those words printed", but the actual point was, that if there is a workflow, or a complete workflow manager, which suits your needs, someone else has done a large part of the scripting work for you and you can focus on the benefits.

Workflow managers of course aren't a silver bullet beneficial in all research, but in case you need to run lots of jobs or linked procedures, automating and managing them with the right tool can increase productivity, document your work and reduce errors.

What to do with the raw data?

How do you make sense of the gigabytes of data produced by HPC simulations? It of course depends on what data you have. The School covered multiple tools to make sense of you data.

Visual inspection is a powerful tool in addition to averages, fluctuations and other numerical comparisons. MD trajectories or optimized conformations were viewed with VMD, electron density and structure were used to compute bonding descriptors using Multiwfn and NCIPLOT and a number of python scripts employing matplotlib for result visualization were given as real life examples on current tools.

To brute force of not to brute force?

Although computers keep getting faster, brute forcing research problems is not always the right way. In one of the parallel tracks on the last day, Dr. Luca Monticelli built on top of the MD lectures of the first day by presenting 6+1 enhanced sampling techniques to enable proper study of rare events.

The last one, coarse graining, strictly speaking is not an enhanced sampling method, but as it is orders of magnitude faster than atomistic simulations it can be used to equilibrate a system quickly enabling switching to atomistic detail from truly independent configurations.

Posters replaced with flash talks

The previous Spring Schools have included the possibility to present posters to facilitate the discussion among participants of one's own research with other participants and lecturers. Posters have helped to discover potential collaborations and new ideas to apply in one’s own research.

There is a lot of potential for collaboration as the School participants come from a highly diverse background as shown in the wordcloud below. The wordcloud is created from the descriptions filled in by the participants at the registration step.

Word Cloud: Scientific background of the participants.

One participant suggested in last year's feedback to replace the poster session with flash talks, which we now did. Each participant was asked to provide one slide to introduce the background, skills and scientific interests, and the slides were used in three minute flash talks to everyone else. The feedback was very positive, so we will likely continue with flash talks also in 2020.

Networking with researchers is yet another motivation to participate in the school. Philipp Müller from Tampere University of Technology took the initiative and proposed a LinkedIN group for the participants to keep in contact also after the school. This was realized on the spot and now the group has already most of the participants signed up.

As potential collaborations are discovered, the HPC-Europa3 programme, also presented in the School, can be used to fund 3-13 week long research visits. Or, if you choose your research visit to take place in Finland in March 2020, you could also participate to the School at the same time.

Whom do the participants recommend the School?

For the first time we asked the participants for their recommendation on who would benefit in participating in the school. The answers range from any under or post-grad student in the field to everyone who needs any computational skills. One participant also confessed that spending some time to learn elementary Python (as suggested) before the School would have been useful. The computational tools known to the participants at registration are collected to the picture below.

Word Cloud: Computational tools used by the participants.

The feedback also emphasized the quality of hands-ons, social events, and overall organization, while the pace of teaching sparked also criticism. This is understandable as the School covers a wide range of topics and therefore it is not possible to go very deep into details. Also, as the background of the participants is heterogeneous some topics are easier for some, but new to others. Partially this has been mitigated by organizing the hands-on sessions of the first two days in three parallel tracks with different difficulty.

The great majority of the participants was satisfied with all aspects of the school. Actually, our original aim has been to introduce the most important fundamental methods and some selected tools so that the participants are aware of them, and in case an opportunity to apply them comes, a deeper study will anyway be necessary.

Materials available online

Most of the lectures and hands-on materials are available on the School home page. The hands-on exercises in particular also also suitable for self study - take a look!

More about the topic:

 

Blogger: Atte Sillanpää Blog Topic: Science and research HPC CSC: Blog News Categories: Research Themes: Laskennallinen tiede

CSC develops, integrates and offers high-quality digital services and is committed to good data management. We believe that the future of the world and people will become better as a result of research, education and knowledge management. That's why we promote them to the best of our abilities and develop and provide internationally high-quality digital services. CSC’s strategic goals include enabling world-class data management and computing and maximizing the value of data.

Data is often too important and valuable to be handled carelessly. In their work our customers, especially researchers, are required to adhere to the FAIR data principles and to make their data Findable, Accessible, Interoperable and Re-usable. Furthermore, they need tools to enable proper data citation. This affects us as a service provider and puts expectations on our data management service development.

Our revised data policy and new policy for persistent identifiers support us in achieving our strategic goals and promote the best data management practices. These newly released policies oblige us to undertake appropriate institutional steps to help customers to safeguard the availability, usability and retention of their data and help us assure compliance with all applicable laws and regulations as well as internal requirements with respect to data management. The policy for persistent identifiers (often referred to as PIDs, the most commonly known are probably the DOI and URN identifiers) enables creation and management of globally unique unambiguous identifiers at CSC for our own processes and for those of our customers.

These documents are, in their first versions, mainly written for research dataset management, but as they represent generic level principles of good data management, they are aimed to cover and guide all data and information management at CSC including both customer-owned and CSC-owned data. In addition, these policies are living documents that will be reviewed regularly and revised when needed.

More information

CSC’s Data Policy

Data Policy in Finnish

CSC’s PID Policy

PID policy in Finnish

Blogger: Jessica Parland-von Essen Minna Ahokas Blog Topic: Data CSC: Blog News Categories: Research Viewpoints: Datan arvon maksimointi Datan säilytys ja turvaaminen

Näin eduskuntavaalivuoden 2019 käynnistyessä koulutuspoliittisen keskustelun keskiössä on jo hyvän aikaa ollut jatkuvan oppimisen mahdollistaminen kansalaisille.

Työelämän alati muuttuessa yhteiskunnan tulisi yhä joustavammin tarjota mahdollisuuksia kansalaisten uudelleenkouluttautumiseen (re-skilling). Toisaalta yhä suuremmaksi on kasvamassa myös tarve tukea työelämässä tapahtuvaa osaamisen kehittämistä (up-skilling). Lineaaristen ja aikaisemmin usein erillisten opinto-ja työurien sijaan tulevaisuudessa opiskelu ja työnteko limittyvät entistä tiiviimmin.

 
"Jatkuvasta oppimisesta puhutaan jo seuraavana koulutusreformina."
 

Ylipäätään aikuisväestön osaamisen elinikäisestä kehittämisestä on nuorten kouluttamisen rinnalla tulossa yhä tärkeämpi yhteiskunnallinen kehittämiskohde. Jatkuvasta oppimisesta puhutaan jo seuraavana koulutusreformina. Koulutusasteiden välinen yhteistyö tulee siis tulevaisuudessa tiivistymään entisestään ja yhteentoimivien IT-palveluiden tarve kasvaa.

Ratkaistavia kysymyksiä on kuitenkin vielä paljon: eikä vähäisimpänä rahoitusmallit. Kiinnostavaa on myös nähdä, missä määrin uudenlaisten digitaalisten palveluiden tukemana voidaan tehdä jo pitkään juhlapuheena olleesta elinikäisestä oppimisesta todellisuutta ja jokaisen kansalaisen arkea.


Henkilökohtainen osaamisprofiili oman osaamisen kehittämisen ytimenä

CSC koordinoi parhaillaan Euroopan komission DG Connect -pääosaston rahoittamaa Compleap-nimistä EU-hanketta, jossa rakennetaan ratkaisuja tähän hyvin ajankohtaiseen jatkuvan oppimisen tarpeeseen.

Yhteistyössä konsortion muiden partnereiden – Opetushallituksen, Oulun yliopiston, Jyväskylän koulutuskuntayhtymä Gradian ja Hollannin opetusministeriön alaisen erillisosasto DUOn kanssa – ollaan suunnittelemassa muun muassa uudenlaista, koulutusasterajat ylittävää digitaalista osaamisprofiilia ja sen prototyyppiä.

 
"Digitaalinen osaamisprofiili kokoaisi yhteen tällä hetkellä hyvin sirpaleisesti useisiin erilaisiin, usein koulutusastekohtaisiin profiileihin kootun tiedon."
 

Kansallisiin koulutustarjontapalveluihin kuten Suomessa Opintopolku.fi yhteyteen kytkettynä tämän kaltainen palvelu tukisi merkittävästä yksilöä opinto- ja urapolkujen suunnittelussa eri elämänvaiheissa. Nykyistä räätälöidymmän ja personoidumman palvelun kautta yksilön olisi helpompi löytää itselleen soveltuvaa tutkintokoulutusta ja tulevaisuudessa myös muita pienempiä opintomoduuleja kuten tutkinnonosia tai osaamismerkkejä.

Digitaalinen osaamisprofiili kokoaisi yhteen tällä hetkellä hyvin sirpaleisesti useisiin erilaisiin, usein koulutusastekohtaisiin profiileihin kootun tiedon. Siten se tukisi myös esimerkiksi opinto- ja uraohjaustyötä tekevien arkea vapauttaessaan aikaa esimerkiksi henkilökohtaiseen ohjaukseen sen sijaan, että joudutaan kartoittamaan yksilön lähtötilanne toistuvasti uudelleen.

Uuden palvelun prototyypin kehittäminen on hyvässä vauhdissa. Palvelua esiteltiin laajalle ja monipuoliselle sidosryhmällemme seminaarissamme Opetushallituksessa 4. joulukuuta.
 



Prototyyppi digitaalisesta osaamisprofiilista. Kuvituskuva.


Yhteistyötä yli hallinto- ja koulutusasterajojen

Varsinaisen yksilön osaamisen kehittämistä tukevan työkalupakin kehittelyn rinnalla Compleap-hankkeessa rakennetaan myös laajempaa kokonaiskuvausta jo olemassa olevien digipalveluiden ekosysteemistä. Kokonaisarkkitehtuurimenettelyn ja mallintamisen avulla pyritään siis tekemään näkyväksi digipalveluiden kokonaiskuvaa ja siten edistämään palveluiden yhteentoimivuutta.

Arkkitehtuurityön tavoitteena on tuottaa EU-tasolle viitekehys, joka voisi tukea digitaalisten palveluiden kehitystyötä niin EU-tasolla kuin yksittäisissä jäsenvaltioissa.

Suomen osalta jo tähän mennessä tehty työ on osoittanut, että kokonaisarkkitehtuuriajattelun ja yhteisten visualisointien kautta voidaan hallinnon- ja koulutusasterajat ylittävää yhteistyötä sujuvoittaa ja yhteentoimivuutta edistää. Näin voidaan rakentaa maailmaa, jossa eri toimijat yhdessä tuottavat palvelukokonaisuutta sen tärkeimmän eli elinikäisen oppijan parhaaksi sen sijaan, että yksittäisiä palveluita kehitettäisiin irrallaan, toisten tekemisistä tietämättömänä tai siitä välittämättä.

Vuoden 2019 lopulla päättyvässä hankkeessa kehitettäviä ratkaisuja tullaan pilotoimaan kuluvan vuoden aikana Suomessa ja kansainvälisesti. Hankkeen etenemistä voi seurata verkkosivuilla sekä Twitterissä @comp_leap.


Lisätietoja:

Antti Laitinen
projektipäällikkö, CSC
p.050 381 8669
antti.laitinen(at)csc.fi
 

 

BLOGIN KUVA: THINKSTOCK

Blogger: Antti Laitinen Blog Topic: IT Management CSC: Blog News Categories: Education


Last year I wrote my first cloud prediction blog post. I have to be honest, predicting “cloud” is a bit of a daunting task, so this year I'll explicitly focus on a more specific area: Cloud computing in research.

Please keep in mind, that these are the predictions of a polite, yet opinionated person, and not the company's.

First prediction: In 2019, the term Cloud will continue to be used both way too narrowly and way too widely. “Cloud means Kubernetes” and “Cloud means IaaS” are on the narrower spectrum, while European Open Science Cloud is on the wider one.
 

European Open Science Cloud (EOSC)

Let’s start with a big, visible topic. I can feel the heat of the directors breathing down my neck when I'm writing this section. So, yes, this is a quite political subject. However, EOSC is of course tightly connected with scientific cloud use, and these are my predictions, so let’s get started.

Congratulations to EOSC, which was officially launched last November! For many people, EOSC was some large ephemeral formless entity that does... something, on the political level. EOSC does have a list of actual services. But that’s pretty much it, a list of links. Should the cloud word even be included?

Well, yes and no. It’s understandable that kick-starting something on this scale takes a while, and as a lot of my colleagues can testify, I’m a big fan of “release early”. However, is it a “Cloud” yet?

No.

EOSC needs to solve the basic issues with a federated cloud marketplace. How are resources granted and paid for? How is the reporting done? Are the SLAs common? What AAI systems and principles are there, and can both users and providers integrate to them?

The services must be easily consumable by the users, and there must be clear integration points for the providers. Being a service catalog, and relying on tons of different contact points between vendors and users, where all pairings have different processes, is neither flexible nor fast. The process planning must be done thoroughly, with an eye on automatability. The resource provides must be in focus in this process planning, but they are less important than the end users.

 

 
”In 2019, the term Cloud will continue to be used both way too narrowly and way too widely."
 


Having the framework in place to connect services to user communities is a great goal. The real “C” of EOSC is a cloud (with at least some of the NIST definition of cloud characteristics (on-demand self-service, resource pooling, and measured service) for integrating services, users, and user communities. It’s not a Kubernetes or Nextcloud service for European scientists.

When a researcher who needs scientific IT resources can go to the EOSC site, find a suitable service, figure out the cost scheme for their use, and be able to get to work during the same day, EOSC will be successful.

EOSC is ambitious, and I’m afraid that too big early expectations will be detrimental to it. The hard problems (e.g. making Authorization, costs, contracts, SLAs and reporting trivial for customers and providers) must be solved, but it takes time. If EOSC can deliver good basic rules and tools for federation, with a focus on making it easy for end users, it will be a great step forward. Not only will the researchers benefit, but the providers gain benefits of scale by building services for larger audiences.

Will the “C” in EOSC be there in 2019? I doubt it, at least not a large part of it. Will EOSC be completely useless in 2019? No, but it will only be able to serve some selected use cases. I expect greater benefits to be reaped within 3–5 years, IF there’s active development in a good direction.
 

FPGAs and scientific code

Accelerators are not a new thing in data centers. Deep learning and cryptocurrency have made the largest waves when it comes to using GPGPUs for acceleration. However, they are not the only codes that benefit from acceleration. More and more other scientific codes are also using GPGPUs for acceleration.

It looks like the next step is FPGAs. Recently FPGAs have become available in commercial IaaS services. Generic accelerator support is also maturing in e.g. OpenStack with the Cyborg service.

FPGAs are often used for accelerating deep learning workloads. However, as with any other type of acceleration, a wide range of computation benefits from the FPGAs. I think we’ll see forays into the FPGA field more and more for scientific computation. Apart from the early adopters, the growth will probably be slow, as it is a new computing paradigm. However, cloud services will provide an easy way to dip your toes into this, for both application developers and users.
 

Scientific data storage

In many cases, scientific data storage usage still follows old patterns. Copy the data from a laptop/USB disk/lab server to a computing cluster/VM/etc. and compute on it. Copy the results somewhere, maybe back to the laptop, play around with them. Maybe you copy the data somewhere else for visualization, or further processing, and you juggle a few copies and versions, and try not to
mix them up.

These models aren’t really efficient, nor easy to use. The future workflows will revolve much more around the data itself. Either data is directly produced to, or you upload the data to a generally accessible storage, most likely an object storage service.

As the data is accessible from wherever you need, you’ll point the computational platforms to the data, and the data location won't change (except temporary copies for computational purposes) throughout the whole analysis workflow.

As these data services are accessible from anywhere you need to, it makes it easy to combine tools from many different provides, which can poke at the data no matter where they are produced. A lot of tooling still needs to be built, but I expect that tools and processes will become mature and usable.

Again, this change will take time, as it needs changing user behavior. However, the rising demand for FAIR (Findable, Accessible, Interoperable, Reusable) principles for research data will probably accelerate this, since the same models make it easier to at least provide the “A” for FAIR.
 

Scientific OpenStacks

In my (anecdotal) experience, the amount of OpenStack installations by scientific infrastructure providers has had a significant growth last year.

The IaaS paradigm has made it easier to manage infrastructure more systematically, both for the customers and providers of the infrastructure. IaaS fills a different need that e.g. HPC clusters which have been traditionally run by scientific computational service providers.

While HPC clusters are somewhat easily usable by end-users, IaaS services provide a more generic infrastructure layer. However, for many scientific OpenStack use-cases (and I’m sure other use-cases too), IaaS is often still seen as an end-product to the users, rather than a generic improvement on infrastructure management.

The OpenStack Summit renamed itself to the Open Infrastructure Summit, as a reflection of the trend that OpenStack’s role is not merely a cloud product to be used, it’s a part of having a software defined infrastructure.

The focus has started moving from “Do we have an IaaS offering?” to “Is our whole IT infra software defined?”. In the latter question OpenStack is a part of the answer, but not the whole answer.

 

 
”This will have a big impact on the availability of scientific IT resources, but it will also push OpenStack itself a bit behind the scenes."
 



I think that many OpenStack installations for scientific use will follow this path. They will no longer be “an OpenStack installation for purpose X” but “Scientific IT resources, usable by X, Y, Z, and our organization is also by the way running our web pages there.”

It will take some time, as it does require quite a high level of maturity from the organization. This will have a big impact on the availability of scientific IT resources, but it will also push OpenStack itself a bit behind the scenes.

That’s not a bad thing, since the services built on top of the open infrastructure are more interesting than the infrastructure itself.

Except of course to cloud-geeks like me.


PICTURE: ADOBE STOCK

Blogger: Kalle Happonen Blog Topic: HPC CSC: Blog Themes: Laskennallinen tiede

Kvanttitietokone on muutaman viime vuoden aikana kivunnut spekulaatiosta ja perustutkimuksesta aidoiksi kaupallisiksi tuotteiksi ja tuotekehitykseksi.

Kvanttitietokoneen on määrä hyödyntää kvanttimekaniikan eriskummallisen maailman ilmiöitä tietyntyyppisten laskentaongelmien nopeaan ratkaisuun. Kvanttitietokoneen toteutusmalleja on useita, ja parhaillaan niiden välillä käydään kiivasta kilpajuoksua. Aika näyttää, mitkä teknologiat jäävät elämään.

Esimerkiksi IBM on julkistanut niin sanotun porttimallin mukaisen Q-kvanttitietokoneen ja Microsoft kehittää StationQ-kvanttitietokonetta. Kolmas malli, joka lupaa ehkä vähiten mutta on teknisesti helpoiten toteutettavissa, on adiabaattinen malli. Kanadalainen D-Wave Systems on julkistanut tähän pohjautuvan kaupallisen kvanttitietokoneen jo muutama vuosi sitten.

Ala on joka tapauksessa yhä alkumetreillään niin teknologian kuin ohjelmointimalliensa suhteen. Onkin osuva analogia verrata kvanttitietokoneiden nykytilaa klassisiin tietokoneisiin 1950-luvulla.
 

Kubitti korvaa tavanomaisen tietokoneen bitin

Kvanttikoneen toiminta eroaa perinteisen tietokoneen toiminnasta merkittävästi. Tavanomaisten tietokoneiden bitin (0 tai 1) korvaa kvanttilaskennassa kvanttibitti eli kubitti (qubit), joka voi olla 0 ja 1 samaan aikaan eli kahden tilan superpositiossa.

Superpositio ei vielä yksikseen riitä, vaan kvanttilaskentaan tarvitaan sekä tilojen superpositio että niiden lomittumisena tunnettu kvantti-ilmiö. Siinä kvanttitilat voivat olla kytkeytyneitä toisiinsa tavalla, jolla ei ole vastaavuutta arkimaailmassa. Yhden kubitin tila voidaan lomittaa toisen kubitin superpositiotilaan siten, että toisen kubitin arvon (0 tai 1) määrittäminen kiinnittää välittömästi myös toisen kubitin arvon. Nämä yhdessä mahdollistavat monimutkaisen yhtäaikaisen laskemisen.

D-Waven kvanttitietokone perustuu kvanttijäähdytykseen (quantum annealing). Kvanttitietokoneen prosessori voidaan virittää kaikkien ratkaisujen superpositioon, jolloin kaikki kubitit ovat 0 ja 1. Purkamalla superpositio riittävän hitaasti prosessori jää minimienergiaa vastaavaan puhtaaseen tilaan, jossa kubitit ovat joko 0 tai 1.
 

Mikä laaksoista on matalin?

Sitä voidaan havainnollistaa ajattelemalla ilmakuvaa vuoristoisesta alueesta ja kysymällä, mikä laaksoista on matalin? Kvanttitietokone siis suorittaa yhtäaikaisesti otannan kaikista laaksoista eli kaikista mahdollisista tiloista, siinä missä klassinen algoritmi etsisi syvintä laaksoa vertaamalla niitä toisiinsa yksi kerrallaan.

Tarkkaan ottaen D-Waven kvanttitietokone siis ratkaisee rajoitteettoman neliöllisen binäärioptimointiongelman yhdessä prosessorisyklissä aiemmin keskusteltuja kvanttimekaanisia ilmiöitä hyödyntäen. Siinä, missä tavallisen tietokoneen perusoperaatiot ovat aritmeettisia, D-Waven koneen perusoperaatio on siis edellä mainittu minimointitehtävä.

Toisin sanoen jokainen kvanttitietokoneella laskettava ongelma onkin esitettävä minimointitehtävänä! Hieman karkeistaen voidaankin ajatella seuraavasti: jos haluamme normaalilla tietokoneella laskea minimointitehtävän, tarvitsemme numeerisen optimointialgoritmin, joka tekee viime kädessä yhteen- ja kertolaskuja – ja tämä muovataan tietokoneohjelmaksi. Jos puolestaan haluaisimme laskea yksinkertaisen yhteenlaskun D-Waven kvanttitietokoneella, on se esitettävä monen välikerroksen kautta minimointitehtävänä.

Onkin ilmeistä, että D-Waven tietokoneelle soveltuvat ongelmat ovat luonteeltaan optimointitehtäviä itsessään, ja muuta sillä ei kannata edes yrittää. Toki sen kaltaisia tehtäviä tieteellisessä laskennassa riittääkin. Hyvä esimerkki tästä on se, että suurin osa fysiikan alan laskentaongelmista ovat lopulta energian minimointiongelmia.

D-Waven koneen voi perustellusti odottaa pystyvän ratkaisemaan myös sellaiset optimointiongelmat, jotka ovat mahdottomia ratkaista nopeimmillekaan tavallisille supertietokoneille. Haasteeksi jää kuitenkin tutkittavan kysymyksen esittäminen optimointiongelmana sekä nykyisen koneen kubittien lukumäärän ja kytkettävyyden riittävyys.

D-Waven koneella tehtyjä tieteellisten ongelmien ratkaisuja on julkaistu jo jonkin verran. Vakuuttavimpia näistä ovat elokuussa Nature-lehdessä julkaistu monimutkaisen kvanttifysikaalisen systeemin simulaatio sekä kesäkuussa Science-julkaisussa raportoitu työ samoin kompleksisten kvanttisysteemien alalta.
 

CSC on järjestänyt työpajan kvanttitietokoneen ohjelmoinnista

Suomessa ei – ainakaan toistaiseksi – ole kvanttitietokonetta, mutta Tieteen tietotekniikan keskus CSC on järjestänyt kahdesti työpajan D-Waven koneen ohjelmoinnista, jossa osallistujat ovat päässeet käsiksi Kanadassa sijaitsevaan kvanttitietokoneeseen. D-Wave on myös antanut kvanttiprosessoriaikaa CSC:n jaettavaksi.

Myös suomalaiset tutkijat ovatkin tuottaneet tieteellisiä ongelmia D-Waven koneelle muun muassa kvanttisysteemien simuloinnista. Nämä tulokset ovat kuitenkin vielä julkaisematta.

Kvanttitietokonealgoritmit ovat todellinen terra incognita; jokainen onnistunut lasku kvanttitietokoneella on tällä hetkellä julkaisun arvoinen tieteellinen löydös.

Panostamalla kvanttitietokoneisiin liittyvän osaamisen kehittämiseen voitaisiin Suomeen luoda jo nyt kansainvälisesti kilpailukykyistä ja eturintamassa olevaa tutkimusta. Kvanttitietokone kun kuitenkin on täällä jo tänään ja hyvinkin suurella todennäköisyydellä se on tullut jäädäkseen.

KUVA: THINKSTOCK

Blogger: Pekka Manninen Blog Topic: HPC CSC: Blog Viewpoints: Supertietokoneet

Sanonta Everything is big in Texas piti paikkansa myös tämän vuoden Supercomputing-konferenssissa (SC18), joka on suurteholaskentaan, tietoverkkoihin, tallennukseen ja data-analyysiin keskittyvä vuotuinen Yhdysvalloissa järjestettävä konferenssi. Järjestyksessään 30:s konferenssi keräsi yhteen ennätykselliset yli 13 000 osallistujaa ympäri maailmaa Dallasiin, Teksasiin 11.–16. marraskuuta 2018. 

Myös CSC:ltä oli muutamia asiantuntijoita mukana kuulemassa ja oppimassa superlaskennan viimeisimpiä trendejä ja teknologioita sekä tapaamassa yhteistyökumppaneita aikataulullisesti hyvin tiiviin viikon aikana. Itse olin mukana edustamassa eurooppalaista suurteholaskennan infrastruktuuri PRACEa (Partnership for Advanced Computing in Europe).

Superviikkoon kuului muun muassa näyttelyosasto valtavassa, noin neljäntoista jalkapallokentän kokoisessa messuhallissa. PRACEn messuständillä pidettiin viikon aikana 12 esitelmää muun muassa PRACEn tarjoamien laskentaresurssien avulla saavutetuista tieteellisistä tuloksista, uusimmasta Scientific Case -raportista ja PRACEn palvelu- ja koulutustarjonnasta.

Näytteilleasettajia konferenssissa oli yhteensä ennätykselliset 364 kappaletta.
 

PRACEn messuosasto SC18-konferenssissa Teksasin Dallasissa.


Muita tapahtuman superlatiiveja oli maailman tehokkain väliaikainen tietoverkko, SCinet, jonka tehokkuus oli 4,02 terabittiä sekunnissa. Vapaaehtoisvoimin toteutetun tietoverkon rakentamiseen käytettiin yli 100 kilometriä kaapelointia, ja sitä varten asennettiin 300 langattoman verkon tukiasemaa konferenssin messuhalliin. Tietoverkon hintalapuksi tuli 52 miljoonaa Yhdysvaltain dollaria.
 

Tekoälyä ja eksaflopseja

Konferenssin teknisen ohjelman sisällöt painottuivat tänä vuonna erityisesti tekoälyyn, etenkin koneoppimiseen, suurteholaskentaan liittyvän ohjelman lisäksi. Tekoälyn kehittämisen keskeinen edellytys on tehokas laskenta, jota tehdään supertietokoneilla.

Myös konferenssin pääpuhuja Erik Brynjolfssonin puheenvuoro ”HPC and Artificial Intelligence – Helping to Solve Humanity’s Grand Challenges” käsitteli suurteholaskennan ja tekoälyn mullistavaa voimaa etenkin maailmanlaajuisten ongelmien kuten ruokapulan ja erilaisten epidemioiden ratkaisemiseksi.
 

CSC:n seuraavan laskentaympäristön toimittaja Atosin messuosasto.


Suurteholaskennan osalta fokuksessa oli eksa-tason laskentaan siirtyminen. Yksi eksaflop tarkoittaa prosessorien laskentatehoa, joka vastaa 1018 (triljoona) liukulukulaskutoimitusta sekunnissa. Ensimmäisten eksa-tason supertietokoneiden odotetaan näkevän päivänvalon vuonna 2021, ja tähän liittyvien teknologioiden, algoritmien ynnä muiden asioiden kehitys on jo täydessä vauhdissa.

Konferenssissa julkaistiin myös uusin supertietokoneiden Top500-lista, jota dominoi Yhdysvaltain energiaministeriö viidellä top10-listasijoituksella. Kärkikymmenikköön kuului tällä erää kaksi eurooppalaista superkonetta, Sveitsissä sijaitseva Piz Daint -niminen supertietokone ja Saksassa sijaitseva SuperMUC-NG-supertietokone.

Myös suomalaisten tutkijoiden ja yritysten on mahdollista saada laskentaresursseja näiltä kahdelta Euroopan kärkikoneelta PRACEn resurssihaun kautta. Seuraava PRACE-laskentaresurssien haku alkaa 5. maaliskuuta 2019.­­­­
 


 

SC18-konferenssin satoa voi lukea Twitteristä aihetunnisteella #SC18. SC-konferenssisarjan eurooppalainen sisar ISC High Performance (International Supercomputing Conference) järjestetään 16.–20. kesäkuuta 2019 Frankfurtissa, Saksassa. Tutkimuspapereita konferenssiin voi jättää 12. joulukuuta 2018 saakka tapahtuman verkkosivujen kautta.

SC-konferenssin seuraava tapahtuma SC19 pidetään marraskuussa 2019 Denverissä, Yhdysvaltain Coloradossa.
 

Lue myös: CSC valitsi Atosin Suomen seuraavan supertietokoneen toimittajaksi

Customer Groups: For Science Blogger: Anni Jakobsson Blog Topic: Working at CSC HPC CSC: Blog

Ymmärrystä oppimisanalytiikasta edistettiin 11.–12.10. EUNISin kansainvälisessä työpajassa, joka järjestettiin Aalto-yliopiston tiloissa Töölössä, Helsingissä. CSC oli mukana tukemassa tilaisuuden järjestämistä. Syksyiseen Helsinkiin paikalle saapui yli 70 osallistujaa kymmenestä maasta. Tapahtuma oli jatkoa viime vuonna Manchesterissa pidetylle tapahtumalle.

Oppimisanalytiikka on yksi tämän hetken keskeisimmistä teemoista koulutuksen kehittämisessä. Sen avulla edistetään opiskelijan oppimista, tarjotaan työkaluja opetuksen tueksi ja hyödynnetään arvioinnissa sekä osana tiedolla johtamista. Tärkeimpiä kysymyksiä tällä hetkellä ovatkin, mitä analytiikalla halutaan saavuttaa tai millaista parannusta sen hyödyntämisellä tavoitellaan. Muun muassa nämä kysymykset olivat työpajassa vahvasti esillä.

Tilaisuudessa osallistujat saivat oppimisanalytiikan asiantuntijoiden johdolla tutustua monipuolisesti analytiikan tarjoamiin mahdollisuuksiin. Keskeinen teema oli, miten analytiikka tukee koulutusta kokonaisuutena. Opetuksen ja oppimisen lisäksi analytiikkaa halutaan hyödyntää muun muassa johtamisen sekä organisaatiokehityksen tueksi. Esimerkiksi Aalto-yliopiston tavoitteena on olla vielä aiempaa enemmän tiedolla johdettu yliopisto. Analytiikka toimii tämän mahdollistajana.

Kiinnostusta ja intoa on laajalti, joten analytiikka tulee oppimiseenkin nyt vauhdilla ja sen hyödyntämisellä pystytään tulevaisuudessa tekemään uusia avauksia. Selkeänä viestinä oli kuitenkin, että tärkeää olisi pysähtyä ja tunnistaa tarpeet analytiikan käyttämiselle. Kysyä, mitä hyötyä analytiikasta on omalle organisaatiolle, opiskelijoille, opettajille, opetuksen kehittäjille ja johdolle; mihin analytiikan avulla halutaan päästä; ja mitkä analytiikan tuomat muutokset ovat organisaatioille tärkeitä.

Yksi mielenkiintoisista puheenvuoroista esitteli SHEILA-projektin, jonka tavoitteena on rakentaa eurooppalainen oppimisanalytiikan käytäntö. Analytiikka-asiat ovat osittain organisaatioiden sisäinen asia, mutta viitekehystyötä tehdään myös kansallisella tai jopa kansainvälisellä tasolla. Yi-Shan Tsai Edinburghin yliopistosta kertoi, että projektissa kehitetyt toimintatavat on koettu hyödyllisiksi. Pitäisikö meidän Suomessakin organisoitua paremmin analytiikan hyödyntämisen tueksi? Löytää yhdessä suunta, miten yhdistää algoritmit ja pedagogiikka toimivaksi kokonaisuudeksi? Työpajassa kävi jälleen ilmi, että heti alkuun tarvitaan määritelmä sille, mitä kaikkea oppimisanalytiikalla tarkoitetaan Suomessa tai mitä sillä ymmärretään.

Kansallinen oppimisanalytiikan viitekehystyö on alulla, ja sitä tehdään muun muassa opetus- ja kulttuuriministeriön alaisen analytiikkajaoston toimesta. Samalla kun organisaationne miettii omia tarpeitaan, olette tervetulleita osallistumaan työhön, jonka tavoitteena on luoda kansallisia suuntaviivoja oppimisanalytiikan hyödyntämiseen ja helpottaa sitä kautta analytiikan käyttöönottoa.

Keskustelu oppimisanalytiikasta jatkuu IT-päivien yhteydessä, Oulun ammattikorkeakoulun ja Oulun yliopiston yhdessä CSC:n kanssa järjestämässä esipäivässä 5.11.2018. Jatkuvan oppimisen digiloikka -foorumissa on tavoitteena kehittää ja yhdistää opiskelijoiden, opettajien ja tutkijoiden käyttöön parhaita digitaalisia palveluja ja oppimista tukevia ratkaisuja. Tällä kertaa teemoina ovat analytiikka ja liike- ja työelämän sekä koulutus- ja tutkimustoimijoiden yhteistyö.


Eurooppalaisten korkeakoulujen tietohallinnon yhteistyöorganisaatio EUNIS (European University Information Systems organisation) tuo yhteen korkeakoulutuksen tietotekniikan osaajat kehittämään ja jakamaan parhaita käytäntöjä tietojärjestelmiin liittyen. Yhteistyötä tehdään erityisesti EUNISin työryhmien kautta. www.eunis.org
 

Lisätietoja: 


Kuva: Kalle Kataila

Blogger: Kaisa Kotomäki Blog Topic: Science and research CSC: Blog Viewpoints: Data ja analytiikka koulutuksen kehittämisessä

Tutkimusrahoitus on kuluvana syksynä näkynyt tiedotusvälineissä. On kirjoitettu vähenevästä julkisesta rahoituksesta, kerrottu yksityisten säätiöiden tuesta tieteelle ja nostettu esiin kilpaillun tutkimusrahoituksen merkitystä. Kuinka tutkimusta sitten rahoitetaan? Tutustutaanpa tarkemmin.

Suomen Akatemia rahoitti 970:ää hanketta vuonna 2017.
Sata suurinta säätiötä vastaa noin neljästä viidesosasta yksityisestä rahoituksesta.
Entäpä Business Finlandin rahoittama tutkimus? Tai Euroopan Unionista virtaava rahoitus?

Tietojen hakeminen nostaa epätoivon aaltoja. Tieto on sirpaleina rahoittajien ja yliopistojen omilla sivuilla eri muodoissa. Kokonaiskuvan hahmottaminen syö aikaa ja hermoja.

Tutkimustietovaranto ratkaisee ongelman. Vuoden 2018 loppuun mennessä rahoituspäätökset kokoava osa, hanketietovaranto, on valmiina kerryttämään tietoa kotimaisesta kilpaillusta tutkimusrahoituksesta. Rahoitustiedon yhdistyessä muuhun tutkimustietoon syntyy suomalaisesta tutkimuksesta tyhjentävä kuva. Varannosta tieto rahoituksesta virtaa yhdenmukaisena korkeakouluihin ja tutkimuslaitoksiin. Ensimmäisenä sinne saadaan tiedot julkisesta rahoituksesta ja vähitellen mukaan tulee myös yksityinen säätiörahoitus.

Lopulta tietovarannon research.fi-portaali palvelee jokaista tiedonjanoista. Kilpailtu rahoitus ilmestyy näytölle parilla hiirenklikkauksella ja tieteen rahoittajat pääsevät ansaitusti näkyville.

Rahoittajien kanssa kokonaisuutta hiotaan marraskuun lopun seminaarissa.

 

Blogger: Walter Rydman Blog Topic: Science and research IT Management CSC: Blog
— 20 Items per Page
Showing 1 - 20 of 91 results.