Variant analysis workshop
Date: 13.06.2016 9:00 - 15.06.2016 17:00
Location details: The event is organised at the CSC Training Facilities located in the premises of CSC at Keilaranta 14, Espoo, Finland. The best way to reach us is by public transportation; more detailed travel tips are available.
Language: English
Lecturers: Benjamin Moore (EBI)
Sergi Beltran (CNAG)
David Salgado (Aix Marseille Universite)
Hanka Venselaar (CMBI)
Christophe Roos (Euformatics)
Ilkka Lappalainen (CSC)
Maria Lehtivaara (CSC)
Eija Korpelainen (CSC)
  • Free for Finnish universities, polytechnics and governmental research institutes.
  • Free for others.
Morning and afternoon coffees and lunch are included in the course.
Registration is closed.
When registering with the Web form, remember to click the Submit button. You should receive an automatic acknowledgement email straight away. NOTE: If there are more applicants than there are seats in the computer classroom, participants will be chosen based on their motivation description in the registration form. We also make every effort to ensure that each research group gets at least one seat, so please indicate your PI in the registration form. All the applicants will be notified as soon as the selection process is completed on 18.5.2016.

This course covers several aspects of variant analysis including annotation and prioritization. The course consists of three independent days, which participants can combine according to their needs. Each course day contains both lectures and hands-on exercises, which are suitable for everybody as analysis is performed with easy-to-use graphical tools. This course is organized by the ELIXIR EXCELERATE project together with RD-Connect. Please note that course videos from this course and its sister course Variant analysis with GATK (16.-17.6.2016) are now available. 

13.6.2016 Introduction to variant analysis from sequencing data

This introductory day covers variant analysis from raw sequence reads to variant annotation, introducing the theory, analysis tools and file formats involved. The program consists of alternating short lectures and hands-on exercises. As the user-friendly Chipster software is used in the exercises, no previous knowledge of Unix or R is required and the tutorial is thus suitable for everybody. Trainers: Maria Lehtivaara and Eija Korpelainen (CSC). Slides and exercises.

The following topics and analysis tools are covered:

  • quality control (FastQC, PRINSEQ)
  • preprocessing (Trimmomatic)
  • alignment (BWA)
  • alignment level QC (Samtools)
  • marking duplicates (Picard)
  • calling and filtering variants (Samtools, BCFtools, VCFtools)
  • variant annotation (VEP)
  • visualizing variants in genomic context (Chipster genome browser using Ensembl data)


14.6.2016 Variant annotation and effect prediction

Session 1: Protein structure analysis of mutations using HOPE (Hanka Venselaar, CMBI)

HOPE is an automatic mutant analysis server that can provide insight into the structural effects of a mutation. It is aimed at users in the biomedical field who wish to visualize and understand their mutation of interest. Given a sequence and mutation, HOPE will show the effect of that mutation in such a way that also those without a bioinformatics background can understand it. Slides with exercises. (also as pdf)

Session 2: Ensembl: Analysing variation data with the Variant Effect Predictor (Benjamin Moore, EBI)

The Ensembl project provides a comprehensive and integrated source of annotation of mainly vertebrate genome sequences. The session begins with an introduction to the Ensembl project, gene model annotations and variation data followed by hands-on demonstrations and exercises focused on viewing the range of variation data available in Ensembl. We will also explore how BioMart can be used as a flexible tool for accessing and downloading a variety of different types of variation data from Ensembl.

The second part of the session covers the Variant Effect Predictor (VEP) in detail and how this tool can be used to analyse variation data. VEP determines the effect of variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. Given the coordinates and nucleotide changes, it reports the genes and transcripts affected and the location of the variants (e.g. upstream of a transcript, in coding sequence, in non-coding RNA, in regulatory regions). It also reports for example the consequence of the variants on the protein sequence (e.g. stop gained, missense, stop lost, frameshift) and SIFT and PolyPhen scores for the changes. The hands-on demonstrations and exercises first use the VEP web interface (this part is suitable for everybody), followed by VEP use via the standalone Perl script and REST API (this part is suitable for participants with coding experience).  Slides and Coursebook with exercises.

Audience: This course is aimed at wet-lab researchers, clinicians and bioinformaticians who want to get hands-on experience using Ensembl to analyse variation data.

Learning outcomes

  • Learn how to view variation data in the Ensembl browser
  • Learn how to analyse your own variation data with the Variant Effect Predictor (VEP)

15.6.2016 Variant analysis and prioritization

Session 1: European Genome-phenome Archive (Ilkka Lappalainen, CSC)

The European Genome-phenome Archive (EGA) is a permanent archive that promotes the distribution and sharing of genetic and phenotypic data consented for specific approved uses but not fully open, public distribution. The EGA follows strict protocols for information management, data storage, security and dissemination. Authorized access to the data is managed in partnership with the data-providing organizations. The EGA includes major reference data collections for human genetics research. Slides.

Session 2: RD-Connect genomics analysis platform (Sergi Beltran, Centro Nacional de Análisis Genómico) 

The RD-Connect genomics analysis platform connects databases, registries, biobanks and clinical bioinformatics for rare disease research. The genomics side of the platform already includes hundreds of exomes linked to detailed phenotypes stored in PhenoTips using the Human Phenotype Ontology. Exomes have been processed with the RD-Connect standard analysis pipeline for genomics, which exceeds 99% precision and sensitivity when compared to the National Institute of Standards and Technology (NIST) reference set of calls for NA12878. The exomes can be combined in a very flexible manner and variants can be filtered and prioritized through the user-friendly interface using the most common quality, genomic location, effect, pathogenicity and population frequency annotations, including CADD and ExAC. In this session you will learn what is the submission workflow to the RD-Connect platform and how you can access and analyse your data. The hands-on part allows you to try the genomics analysis platform yourself. Slides and exercises.

Session 3: Variant prioritization with UMD-PredictorVarAFT and Human Splicing Finder (David Salgado, Aix Marseille Universite)
Sequencing generates huge amount of variants. Even after quality checks thousands of variants need to be prioritized to highlight potential causative mutations, often limited to one or two mutations depending on the mode of inheritance. This session covers the following systems which help with these tasks:

  • UMD-Predictor, a pathogenicity prediction system for any human cDNA substitution leading to synonymous, missense or nonsense amino acid change.
  • Human Splice Finder a pathogenicity prediction system for any intronic/exonic mutations that could impact splicing signals. 
  • VarAFT, a prioritization system to analyse and filter variations from NGS data.

During this training session, participants will be introduced to these systems and will use UMD-Predictor and VarAFT on real use-cases. Slides1, slides2, exercises.

Session 4: Genome variant annotation and interpretation in a clinical context using omnomicsNGS (Christophe Roos, Euformatics)

omnomicsNGS is a CE-marked software platform for clinical reporting on patient NGS data. It provides relevant genomic and mutation information for clinicians and molecular genetics laboratories based on patient data and external information sources. It can be tailored to report on markers in specific gene panels and it can also be used for analyses on rare genetic diseases using whole exome sequence data. Slides and exercises.

Every course day starts at 9 am and finishes at 5 pm. The detailed schedule will be published here soon.
Course materials