Sciences and methods > Biosciences > Programs > BLAST
 
Tehdyt toimenpiteet

BLAST

Version

  • NCBI BLAST version 2.2.22 is available in murska.csc.fi
  • NCBI BLAST version 2.2.22 is available in hippu.csc.fi.
  • A WWW interface to BLAST is available in the Scientist's Interface.


NOTE!  A major change in the BLAST syntax occurred when BLAST was updated to version 2.2.22.
The older BLAST versions are still available. Instructions for these commands can be found from link bellow


Description

BLAST (Basic Local Alignment Search Tool) is the most frequently used sequence homology search tool. Given a probe sequence (nucleotide or protein), BLAST compares it to a sequence database and picks out sequences with significant similarity to the probe sequence. BLAST uses a heuristic search protocol, which makes search very fast compared to non-heuristic methods. The heuristics used may however cause BLAST to fail to find all significant hits.

The command line version of NCBI-BLAST allows a user to modify all parameters of BLAST, to use special methods like PSI-BLAST and PHI-BLAST, and to analyze large data sets.

In Murska you can use pb (Parallel Blast) command for large sets of query sequences. The pb program splits a large search jobs into severall subjobs, that can be executed simultaneously (more below).


The most commonly used BLAST commands are:

Search Commands

  • blastn search hits for a nucleotide sequence from nucleotide database
  • blastp search hits for a protein sequence from protein database
  • blastx search hits for a nucleotide sequence from protein database
  • psiblast do iterative search for a protein sequence from protein database
  • rpsblast search hits for a protein sequence from protein profile database
  • rpstblastn search hits for a nucleotide sequence from protein profile database
  • tblastn search hits for a protein sequence from nucleotide database
  • tblastx search hits for a nucleotide sequence from nucleotide database by using the protein translations of both query and database sequences.

Database commands

  • blastdbcmd retrieve a sequence or a set of sequences form BLAST databases
  • makeblastdb create a new BLAST database


Usage


Searches
To use the latest BLAST in Hippu or in Murska, first give set up command (note the + sign in the end of the command):
module load blast+
After that you can start using the BLAST commands listed above. For example following command would search for sequence homologs form UniProt database for a protein sequence.
blastp -query proteinseq.fasta -db uniprot -out result.txt
You can use -help option to see, what command line options are available for a certain BLAST command. For example
blastp -help
For example command: 
blastp -query proteinseq.fasta -evalue 0.001 -db uniprot -outfmt 7 -out result.table
Would run the same search as decribed above, except that the  e-value threshold would be set to 0.001(-evalue 0.001) and the out put is printed out a a table (-outfmt 7).


Usage of pb (Parallel BLAST) in Murska and Hippu

If your query sequence set contains less than 20 sequences then Hippu is probable the most effective platform for the search. However, if your query set contains hundreds or thousands of sequences then utilizing the murska.csc.fi cluster  is more  effective. For this kind of massive blast searches you can also utilize the pb commad.

In Murska and Hippu , you can execute any BLAST search command through the pb command. pb (Parallel BLAST) is designed for situations, where the query file includes several sequences. In murska.csc.fi It splits the query task into several subjobs, that can be run simultaneously using the resources of the server very effectively. For large sets of query sequences, pb can speed up the search up to 50 fold. Two sample pb commands for murska.csc.fi:

module load blast+

pb blastn -db embl -query 100_ests.fasta -out results.out

pb psiblast -db swiss -query protseqs.fasta -num_iterations 3 -out results.out

The pb program also allows users to do BLAST searches against their own fasta formatted sequence sets. This is done by replacing the -db option with option -dbnuc (for nucleotides) or -dbprot (for proteins). Example: 

pb blastn -dbnuc my_seq_set.fasta -query querys.fasta -out results.out


BLAST databases in Murska and Hippu


Bellow is a list of BLAST databases maintained in the Hippu and Murska servers.

Name database source files
Nucleotides
arabidopsisN Aabidopsis thaliana nucleotide sequences ArabidopsisN.Z
embl_est EMBL EST division est files
embl_gss EMBL GSS division gss files
embl_htg EMBL HTG division htg files
embl_others EMBL excluding the EST, GSS and HTG divisions, and the WGS sets other files
emblnew EMBL updates cum files
nt NCBI non-redundant nucleotide database nt.gz
refseq NCBI RefSeq RNA database rna.fna.gz files
refseq_con NCBI RefSeq human contigs all chromosomes
Proteins
nr NCBI non-redundant protein database nr.gz
pdb PDB protein structure database pdb_seqres.txt
swiss Uniprot/Swiss database uniprot_sprot.fasta.gz
trembl Uniprot/TrEMBL database uniprot_trembl.fasta.gz
uniref100 Uniref100 database
uniref100.fasta.gz
uniref90 UniRef90 database
uniref90.fasta.gz
uniref50 UniRef50 database
uniref50.fasta.gz




Users own BLAST databases

CSC offers three ways to do BLAST queries against users own sequence sets

1. MyBLAST www intrface

Users can do BLAST searches against their own fasta formatted sequence sets with My BLAST www interface:

2. pb BLAST

In pb blast own databases are used by replacing the -db option with option -dbnuc (for nucleotides) or -dbprot (for proteins). Example: 
pb blastn -dbnuc my_seq_set.fasta -query querys.fasta -oout results.out


3. makeblastdb


You can use command makeblastdb to create a BLAST search database form you own asn1 or fasta formatted sequence set. If you use fasta formatted sequence files, please note that makeblastdb command assumes that the comment lines in the fasta file contain ncbi style names for the sequences ( e.g. gnl|db_name|sequece_id).

You can use EMBOSS command seqret to convert a normal fasta file to a ncbi formatted fasta file. For exmaple:

module load emboss
seqret my_seqs.fasta my_seqs_ncbi.fasta -osf ncbi
After this the BLAST database can be created with command:
makeblastdb -in my_seqs_ncbi.fasta -out my_seqs -parse_seqid
After this you can launch a search command. For example:
blastp -query proteinseq.fasta -db my_seqs -out result.txt

Note that if the BLAST database does not locate in the directory where the search command is executed, then the location of the database must be defined with environment variable BLASTDB

setenv BLASTDB /path/to/your/blastdatabse

Note that the -dbprot and -dbnuc options of the pb command (described above) do the database building operations automatically.


More information



Korpelainen Eija eija.korpelainen at csc.fi
Saren Ari-Matti ari-matti.saren at csc.fi
Mattila Kimmo kimmo.mattila at csc.fi