EGA is an extensive archive of biomedical patient data

Image: Adobe Stock

EGA is an extensive archive of biomedical patient data

European Genome-Phenome Archive (EGA) archives and distributes patient data collected in biomedical studies. EGA is a part of the European Elixir data infrastructure that serves life sciences. The repository is one of the world’s most extensive ones. The sensitive data has been de-identified to protect the subjects’ privacy and access to it is strictly regulated. The data is only available for scientific research, and permissions to use it are granted to researchers based on personal, detailed applications. Secondary use of the material is monitored by the Data Access Committee.

– Many international research funders require that the genomes are deposited in this kind of international databases, so that they are available the wider research community, as long as the research meets the access criteria, says Samuli Ripatti, professor of biostatistics at the University of Helsinki. – This course of action has enabled many new scientifically remarkable genetic and other discoveries.

– For example, the currently intense global COVID-19 research looking for factors that would explain why the disease affects some of the infected more severely and why some have barely any symptoms at all is based on extensive international collaboration and meta-analysis of research findings. The consortium publishes all of its analysis results directly online accessible to everybody, and EGA offers research groups the opportunity for centralized storing of data on its servers whenever possible.

EGA’s online metadata catalogue is open for anybody to browse but accessing the data itself requires applying for access to one or several datasets and creating a EGA user account. As soon as access has been granted, the data is available for download. In order to be applicable for being deposited in EGA, the data needs to be de-identified and adhere to EGA’s file format requirements. The largest single data segment is presently related to cancer research.

– My research group and I study the genetic risk factors of endemic diseases and their correlation with lifestyle choices and other non-genetic factors, Ripatti explains. – For instance, we have discovered many genes and genomes that regulate the cholesterol metabolism as well as genome variants that predispose to or protect from cardiovascular diseases. We have also created risk algorithms that use genomic information and other risk factors to assess persons’ risk to contract e.g. common cancers or diabetes.

– This research is based on utilizing such extensive population data that includes both each subject’s genetic profile and health information. Finnish and international biobanks and e.g. data deposited in EMBL-EBI’s EGA or NHGRI’s dbGap are this kind of research material that is based on personal consent.

– Europe-wide support for sensitive data management requires close collaboration within the European Union and being able to network with partners outside EU as well, says Ilkka Lappalainen, development manager at CSC. – EGA’s strength lies in ensuring data compatibility using global standards as well as the explicit and secure data management model that supports secondary use of research results. CSC has participated in developing the new federated EGA service (FEGA) coordinated by the ELIXIR research organization for several years. FEGA is maintained by CSC to support Finnish research and is set for launch in early 2021.

– The national FEGA service ensures that domestic data remains in Finland and our data’s prominence as a part of international research, and enables secure data analysis as a part of CSC’s computing environment for research and education. The FEGA service is being fervently developed in Europe also for managing the research data samples collected from COVID-19 patients. We also collaborate with professor Samuli Ripatti, says Heikki Lehväslaiho, sensitive data expert at CSC.

Overview of FEGA:


More information about EGA.


Check out CSC’s new data management site and service catalog. Are you applying for funding from Academy of Finland? The information package for the academy applicant gathers useful links to our new data management service.

More about this topic » Go to insights and news »

Tero Aalto

The author is a language technologist and works with the Language Bank of Finland.