Sciences and methods > Biosciences > Programs > BLAST > BLAST(old)
 
Tehdyt toimenpiteet

BLAST(old)

Version

    • NCBI BLAST version 2.2.20 is available in murska.csc.fi
    • NCBI BLAST version 2.2.20 is available in hippu.csc.fi.
    • An optimized blatstpgp (PSI-BLAST) is available in murska.csc.fi.
    • A WWW interface to BLAST is available in the Scientist's Interface.


Description

BLAST (Basic Local Alignment Search Tool) is the most frequently used sequence homology search tool. Given a probe sequence (nucleotide or protein), BLAST compares it to a sequence database and picks out sequences with significant similarity to the probe sequence. BLAST uses a heuristic search protocol, which makes search very fast compared to non-heuristic methods. The heuristics used may however cause BLAST to fail to find all significant hits.

The command line version of NCBI-BLAST allows a user to modify all parameters of BLAST, to use special methods like PSI-BLAST and PHI-BLAST, and to analyze large data sets.

In Murska and Corona you can also use pb (Parallel Blast) command for large sets of query sequences. The pb program splits a large search jobs into severall subjobs, that can be executed simultaneously (more below).

NCBI BLAST includes e.g. the following programs

  • blastall: normal BLAST searches (blastp, blastn, blastx, tblastn, or tblastx)
  • blastclust: clusters protein or DNA sequences based on pairwise matches found using the BLAST or MegaBLAST algorithm.
  • blastpgp: performs gapped blastp searches and can be used to perform iterative PSI-BLAST and PHI-BLAST searches.
  • bl2seq: performs a comparison between two sequences using the blast algorithm.
  • megablast: uses the greedy algorithm of Webb Miller et al. for nucleotide sequence alignment search and concatenates many queries to save time spent on scanning the database.



Usage of NCBI BLAST

WWW interface

Two BLAST WWW interfaces are available in the Scientist's Interface. One for searcing from the public databases and one for
searching form users own sequence set.



Command line usage in Hippu.csc.fi

To set up NCBI BLAST in hippu.csc.fi, give command:

module load blast

After this you can check the command line options or a specific BLAST program by typing program_name --help, for example:

blastall --help

Example of command line BLAST usage:

BLAST search with fasta-formatted protein sequence file "protsequence.fasta" against PDB database:

blastall -p blastp -d pdb -i protseq.fasta -o results.out


Hippu.csc.fi is a very effective platform for doing BLAST searches  and it has plenty of memory and several multicore processors that BLAST can utilize. To speed up your search you can select to use more than 1 processor for compting. This is done with option -a. For example that dommand bellow does the search using 8 processors.

blastall -p blastp -a 8 -d pdb -i protseq.fasta -o results.out




Usage of pb (Parallel BLAST) in Murska and Hippu


If your query sequce set contains less than 20 sequcenses the hippu is probable the most effective platform for the search. However, if your query set contains hundreds or thousands of sequenses then utiizing the murska.csc.fi cluster environent is more  effective. For this kind of massive blast searches you can also utilize the pb commad.

In Murska and Hippu , you can execute any blastall, megablast or blastpgp command through the pb command. pb (Parallel BLAST) is designed for situations, where the query file includes several sequences. In murska.csc.fi It splits the query task into severall subjobs, that can be run simultaneously using the resources of the server very effectively. For large sets of query sequences, pb can speed up the search up to 50 fold. Two sample pb commands for murska.csc.fi:

module load blast

pb blastall -p blastn -d embl -i 100_ests.fasta -o results.out

pb blastpgp -d swiss -i protseq.fasta -j 3 -a 4 -o results.out

The pb program also allows users to do BLAST searches against their own fasta formatted sequence sets. This is done by replacing the -d option with option -dbnuc (for nucleotides) or -dbprot (for proteins). This utility is useful in hippu.csc.fi too.  Example: 

pb blastall -p blastn -dbnuc my_seq_set.fasta -i querys.fasta -o results.out

Whit pb, you can also use three addintional output formats for BLAST results.
-m 12 prints out the hit sequences as EMBOSS list file
-m 13 prints out the hit sequences in fasta format
-m 14 prints out the matching regions of the hit sequences in fasta format



BLAST databases in Murska and Hippu

Name database source files
Nucleotides
arabidopsisN Aabidopsis thaliana nucleotide sequences ArabidopsisN.Z
embl_est EMBL EST division est files
embl_gss EMBL GSS division gss files
embl_htg EMBL HTG division htg files
embl_others EMBL excluding the EST, GSS and HTG divisions, and the WGS sets other files
emblnew EMBL updates cum files
nt NCBI non-redundant nucleotide database nt.gz
refseq NCBI RefSeq RNA database rna.fna.gz files
refseq_con NCBI RefSeq human contigs all chromosomes
Proteins
nr NCBI non-redundant protein database nr.gz
pdb PDB protein structure database pdb_seqres.txt
swiss Uniprot/Swiss database uniprot_sprot.fasta.gz
trembl Uniprot/TrEMBL database uniprot_trembl.fasta.gz
uniref100 Uniref100 database
uniref100.fasta.gz
uniref90 UniRef90 database
uniref90.fasta.gz
uniref50 UniRef50 database
uniref50.fasta.gz


Users own databases

Users can do BLAST searches against their own fasta formatted sequence sets with My BLAST
www interface:

Other option is to use pb blast in Murska or in Hippu. In pb blast own databases are used by replacing the -d option with option -dbnuc (for nucleotides) or -dbprot (for proteins). Example: 
pb blastall -p blastn -dbnuc my_seq_set.fasta -i querys.fasta -o results.out

More information

Korpelainen Eija +358 9 457 2030 Eija.Korpelainen at csc.fi
Mattila Kimmo +358 9 457 2708 Kimmo.Mattila at csc.fi
Saren Ari-Matti +358 9 457 2282 Ari-Matti.Saren at csc.fi