Version
- SSAHA2 version 2.5 is available in Hippu and Vuori.
Description
SSAHA2 is a very fast search tool to find nearly identical matches for DNA sequences. It utilizes hash function based SSAHA algorithm to quickly locate matching regions in the target sequences and then performs the alignment with cross_match sequence alignment program. If you are looking only for nearly identical matches or your query sequeces are short DNA fragments, SSAHA2 is probably better choice for data base searching than BLAST. SSAHA2 parameters can be tuned via a number of command line options for a wide range of applications.:
-
mapping of sequence reads (Solexa, ABI, Sanger) to a genomic reference sequence
-
polymorphism detection
-
cross-species whole genome alignment
-
EST/cDNA alignment
-
BACends placement
-
database searching of reads and fragments
-
mapping of segmental duplications
-
primer design
SSAHA2 programs
| Program |
Description |
|---|---|
| ssaha2 | This program aligns query sequences against a hash table created with saha2Build |
| ssaha2Build | This program reads a fasta formatted sequence database and constructs the hash table required by the ssaha2 search programs.
|
| ssahaSNP | ssahaSNP is a polymorphism detection tool. It detects homozygous SNPs and indels by aligning shotgun reads to the finished genome sequence. From the best alignment, SNP candidates are screened, taking into account the quality value of the bases with variation as well as the quality values in the neighbouring bases, using neighbourhood quality standard (NQS). |
Using SSAHA2 in Hippu and Vuori
The SSAHA2 commands can be launched by typing their name. For example to print the command line help for ssaha2 type:
ssaha2 -help
CSC does not maintain SSAHA2 indexes for sequence databases. If you wish to use SSAHA2 you can easily create a SSAHA2 index files yourself. Doing a SSAHA2 search typically includes following steps
Move to your $WRKDIR directory
cd $WRKDIR
Load the Fasta formatted sequence data your $WRKDIR directory. For example loading and uncompressing the human genome from the ensembl ftp site could be done with commands:
wget ftp://ftp.ensembl.org/pub/current_fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.55.dna.toplevel.fa.gz
unzip Homo_sapiens.GRCh37.55.dna.toplevel.fa.gz
When you have the fasta formatted sequence database in your directory, you can create the ssaha2 index files for the sequence set with the ssaha2Build program.
ssaha2Build -save H_sapiens_ssaha2 Homo_sapiens.GRCh37.55.dna.toplevel.fa
After this you can start doing the searches
ssaha2 -save H_sapiens_ssaha2 query_seqences.fasta > results.txt
Reference
If you use ssaha2, ssahaEST or ssahaSNP for any scientific work please cite the reference below or this web page as appropriate.