null Who is there and what are they doing

"Who is there and what are they doing?"

Heta Koski, Eija Korpelainen

CSC course on metagenomics sheds light on studying microbial communities.

Metagenomics is the study of genetic material recovered directly from environmental samples. It can answer questions like "who is there?" and "what are they doing?".

Microbial communities are investigated in many research fields, ranging from the remediation of polluted soil to human gut microbiome in disease. Metagenomics also allows us to discover new microbial proteins which can be used in biotechnology.

The whole metagenomics field is growing exponentially, as the recent advances in sequencing technologies have enabled more research groups to do metagenomics and perform larger studies. However, analyzing the data has become a bottleneck.

"Big demand for training"

In order to address this issue, CSC organized an international metagenomics data analysis course in April 2017 as part of the ELIXIR EXCELERATE and PRACE projects. The teachers included specialists from the Norwegian ELIXIR node, the European Bioinformatics Institute, Finnish universities and CSC.

– We are really happy that EXCELERATE and PRACE enabled us to organize this course, and so many experts were willing to come to teach. There is clearly a big demand for metagenomics training, as we have 50 participants from 11 countries, and some applicants were left out due to space limitations. In order to enable a larger number of people to benefit from the course, we record the lectures and make the videos and training material available, explains Dr Eija Korpelainen, the ELIXIR-Finland training coordinator from CSC.

Lecturers Nils Willassen, Anu Mikkonen, Jenni Hultman and Eija Korpelainen.


Right tools for simpler analysis

Dr Jenni Hultman from the University of Helsinki knows from experience that a need for this kind of course truly exists in Finland. She studies arctic microbial communities and gave a lecture on assembling genomes from metagenomics data.

– When I first got interested in metagenomics data analysis and wanted to know more, there was practically no one in Finland who could have helped me. So I had to go abroad. Some researchers have had access to datasets that nobody in their research group could analyze. Now they've seen it is actually not so hard.

Also Dr Anu Mikkonen from the University of Jyväskylä finds the course worthwhile. She lectured on microbiome analysis and experimental design using examples from her work on soil research.

– I've already heard many comments that this course is really useful. I have also learned a lot! 

Metagenomics data analysis is actually quite simple if you have the right tools, confirms Dr Nils Willassen. He and his team from ELIXIR-Norway presented the META-pipe analysis pipeline that they have developed and the participants got hands-on training in using it.

– One thing I try to remind the researchers of: contact us before your project starts, so we can help you to design the experiment, says Willasen.

Need for computing resources will grow immensly

When asked about the challenges that metagenomics researchers encounter, all three lecturers mention the number of samples. There is either a huge amount of samples making the data analysis a bottleneck, or too few samples, and therefore, not enough replicates for statistical analysis.

– While sequencing is getting cheaper, the data analysis requires a lot of computing resources, says Hultman.

When every research group can do metagenomics analysis at a reasonable cost, the need for computing resources will grow immensely, Mikkonen points out.

– While our META-pipe analysis pipeline is publically available, one major challenge is that we cannot offer computing resources for all the researchers in the world. We need to find sustainable solutions for covering the computing costs, Willasen explains.

– And computing resources aren't the only challenge. Also storage capacity and costs play a big role.

The course program, training materials and lecture videos are available on the course web site.


ELIXIR is an intergovernmental organisation that brings together life science resources from across Europe including databases, software tools, training materials, cloud storage and supercomputers. CSC is the ELIXIR node in Finland. EXCELERATE funding helps ELIXIR coordinate and extend national and international data resources to ensure the delivery of world-leading life-science data services. It supports a pan-European training programme, anchored in national infrastructures, to increase bioinformatics capacity and competency.

PRACE (the Partnership for Advanced Computing in Europe) is an international non-profit association. This Research Infrastructure provides a persistent world-class high performance computing service for scientists and researchers from academia and industry in Europe, and it is supported from the EU's Horizon 2020 Research and Innovation Programme.