Version
-
NCBI BLAST version 2.2.20 is available in murska.csc.fi
- NCBI BLAST version 2.2.20 is available in hippu.csc.fi.
- An optimized blatstpgp (PSI-BLAST) is available in murska.csc.fi.
- A WWW interface to BLAST is available in the Scientist's Interface.
Description
BLAST (Basic Local Alignment Search Tool) is the most frequently used sequence homology search tool. Given a probe sequence (nucleotide or protein), BLAST compares it to a sequence database and picks out sequences with significant similarity to the probe sequence. BLAST uses a heuristic search protocol, which makes search very fast compared to non-heuristic methods. The heuristics used may however cause BLAST to fail to find all significant hits.
The command line version of NCBI-BLAST allows a user to modify all parameters of BLAST, to use special methods like PSI-BLAST and PHI-BLAST, and to analyze large data sets.
In Murska and Corona you can also use pb (Parallel Blast) command for large sets of query sequences. The pb program splits a large search jobs into severall subjobs, that can be executed simultaneously (more below).
NCBI BLAST includes e.g. the following programs
- blastall: normal BLAST searches (blastp, blastn, blastx, tblastn, or tblastx)
- blastclust: clusters protein or DNA sequences based on pairwise matches found using the BLAST or MegaBLAST algorithm.
- blastpgp: performs gapped blastp searches and can be used to perform iterative PSI-BLAST and PHI-BLAST searches.
- bl2seq: performs a comparison between two sequences using the blast algorithm.
- megablast: uses the greedy algorithm of Webb Miller et al. for nucleotide sequence alignment search and concatenates many queries to save time spent on scanning the database.
Usage of NCBI BLAST
WWW interface
Two BLAST WWW interfaces are available in the Scientist's Interface. One for searcing from the public databases and one forsearching form users own sequence set.
Command line usage in Hippu.csc.fi
module load blast
After this you can check the command line options or a specific BLAST program
by typing program_name --help, for example:
blastall --help
Example of command line BLAST usage:
BLAST search with fasta-formatted protein sequence file "protsequence.fasta" against PDB database:
blastall -p blastp -d pdb -i protseq.fasta -o results.out
blastall -p blastp -a 8 -d pdb -i protseq.fasta -o results.out
Usage of pb (Parallel BLAST) in Murska and Hippu
If your query sequce set contains less than 20 sequcenses the hippu is probable the most effective platform for the search. However, if your query set contains hundreds or thousands of sequenses then utiizing the murska.csc.fi cluster environent is more effective. For this kind of massive blast searches you can also utilize the pb commad.
In Murska and Hippu , you can execute any blastall, megablast or blastpgp command through the pb
command. pb (Parallel BLAST) is designed for situations, where the
query file includes several sequences.
In murska.csc.fi It splits the query task into severall subjobs, that can be run
simultaneously using the resources of the server very
effectively. For large sets of query sequences, pb can speed up the
search up to 50 fold. Two sample pb commands for murska.csc.fi:
module load blast
pb blastall -p blastn -d embl -i 100_ests.fasta -o results.out
pb blastpgp -d swiss -i protseq.fasta -j 3 -a 4 -o results.out
The pb program also allows users to do BLAST searches against their own fasta formatted sequence sets.
This is done by replacing the -d option with option -dbnuc (for nucleotides) or -dbprot
(for proteins). This utility is useful in hippu.csc.fi too. Example:
pb blastall -p blastn -dbnuc my_seq_set.fasta -i querys.fasta -o results.out
Whit pb, you can also use three addintional output formats for BLAST results.
-m 12 prints out the hit sequences as EMBOSS list file
-m 13 prints out the hit sequences in fasta format
-m 14 prints out the matching regions of the hit sequences in fasta format
BLAST databases in Murska and Hippu
| Name | database | source files |
|---|---|---|
| Nucleotides | ||
| arabidopsisN | Aabidopsis thaliana nucleotide sequences | ArabidopsisN.Z |
| embl_est | EMBL EST division | est files |
| embl_gss | EMBL GSS division | gss files |
| embl_htg | EMBL HTG division | htg files |
| embl_others | EMBL excluding the EST, GSS and HTG divisions, and the WGS sets | other files |
| emblnew | EMBL updates | cum files |
| nt | NCBI non-redundant nucleotide database | nt.gz |
| refseq | NCBI RefSeq RNA database | rna.fna.gz files |
| refseq_con | NCBI RefSeq human contigs | all chromosomes |
| Proteins | ||
| nr | NCBI non-redundant protein database | nr.gz |
| pdb | PDB protein structure database | pdb_seqres.txt |
| swiss | Uniprot/Swiss database | uniprot_sprot.fasta.gz |
| trembl | Uniprot/TrEMBL database | uniprot_trembl.fasta.gz |
| uniref100 | Uniref100 database |
uniref100.fasta.gz |
| uniref90 | UniRef90 database |
uniref90.fasta.gz |
| uniref50 | UniRef50 database |
uniref50.fasta.gz |
Users own databases
Users can do BLAST searches against their own fasta formatted sequence sets with My BLASTwww interface:
Other option is to use pb blast in Murska or in Hippu. In pb blast own databases are used by replacing the -d option with option -dbnuc (for nucleotides) or -dbprot (for proteins). Example:
pb blastall -p blastn -dbnuc my_seq_set.fasta -i querys.fasta -o results.out
More information
| Korpelainen Eija | +358 9 457 2030 | Eija.Korpelainen at csc.fi |
| Mattila Kimmo | +358 9 457 2708 | Kimmo.Mattila at csc.fi |
| Saren Ari-Matti | +358 9 457 2282 | Ari-Matti.Saren at csc.fi |