Version
BWA 0.6.1 is available in Hippu and Vuori and as a grid implementation.
BWA 0.5.9 is available in Murska
BWA can also be used through the Chipster graphical user interface
Description
Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome. It implements two algorithms, BWA-aln and BWA-SW. The former works for query sequences shorter than 200bp and the latter for longer sequences up to around 100kbp. Both algorithms do gapped alignment.BWA can be used to align both single-end and paired end reads to a reference genome or sequence set.
Usage
To initialize the program you need to give command:
module load bwaThe basic syntax of BWA commands is:
bwa <command> [options]
Reference genome indexing
CSC does not maintain pre-compiled BWA indexes for reference genomes. Thus normally the fist step in creating alignment with BWA is downloading the reference genome and indexing it. Please note that your $HOME directory is often too small for working with complete genomes. In stead you should do the analysis in temporary directories like $WRKDIR, $METAWRK or $FCWRKDIR.
You can use for example command ensemblfetch or wget to download a reference genome to CSC. For example
ensemblfetch homo_sapiens
The command above retrieves the human genome sequence to a file called. Homo_sapiens.GRCh37.63.dna.toplevel.fa. You can calculate the BWA indexes for this file with command:
bwa index -a bwtsw Homo_sapiens.GRCh37.63.dna.toplevel.fa
Note that for small less than 2 GB reference genomes you could use faster, "is" indexing algorithm ( bwa index -a is )
Single-end alignment
Once the indexing is ready you can carry out the alignment for singe-end reads with command;
bwa aln Homo_sapiens.GRCh37.63.dna.toplevel.fa reads.fastq > aln_sa.sai
The result file is in BWA specific .sai format that you can convert to SAM format with bwa samse command:
bwa samse Homo_sapiens.GRCh37.63.dna.toplevel.fa aln_sa.sai reads.fastq > aln.sam
Paired end alignment
In the case of paired-end reads you should have read pairs in two matching fastq files. In this case you first do a separate alignment run for each read file:
bwa aln Homo_sapiens.GRCh37.63.dna.toplevel.fa reads1.fq > aln1.sai
bwa aln Homo_sapiens.GRCh37.63.dna.toplevel.fa reads2.fq > aln2.sai
The two sai alignment files are combined with command bwa sampe:
bwa sampe Homo_sapiens.GRCh37.63.dna.toplevel.fa aln1.sai aln2.sai reads1.fq reads2.fq > aln.sam
Running BWA alignments utilizing grid computing
Aligning millions of reads to a large reference genome can take several hours or even days. Using grid computing through grid_bwa command you can speed up the alignment process ten fold or more. grid_bwa command splits the alignment task into several subtasks that it submits to be simultaneously executed in the FGI grid environment. When all the subtasks are ready they are collected and combined into a single result alignment.
To be able to use grid_bwa command you should have:
- A valid grid certificate installed in the hippu.csc.fi server.
- Membership of FGI Virtual Organization
Please check the detailed instructions to obtain and install a grid certificate and to join FGI Virtual organization.
Once you have the certificate installed and the Virtual organization membership is approved, you can submit a grid_bwa jobs with for example following commands:
module load bwa
module load nordugrid-arc
grid-proxy-init -rfc -valid 72:00
grid_bwa aln -query seq_set1.fq -query2 seq_set2.fq -ref ref_genome.fasta -out paired_end_results
grid_bwa -help