Sciences and methods > Biosciences > Programs > BLAST
 
Tehdyt toimenpiteet

BLAST

Usage

  • NCBI BLAST version 2.2.25 is available in vuori.csc.fi and hippu.csc.fi
  • NCBI BLAST version 2.2.24 is available in murska.csc.fi.
  • BLAST commands are included to the  Embster graphical sequence analysis interface


NOTE!  A major change in the BLAST syntax occurred when BLAST was updated to version 2.2.22.
The older BLAST versions are still available. Instructions for these commands can be found from link bellow


Description

BLAST (Basic Local Alignment Search Tool) is the most frequently used sequence homology search tool. Given a probe sequence (nucleotide or protein), BLAST compares it to a sequence database and picks out sequences with significant similarity to the probe sequence. BLAST uses a heuristic search protocol, which makes search very fast compared to non-heuristic methods. The heuristics used may however cause BLAST to fail to find all significant hits.

The command line version of NCBI-BLAST allows a user to modify all parameters of BLAST, to use special methods like PSI-BLAST and PHI-BLAST, and to analyze large data sets.

In Murska and Vuori you can use pb (Parallel Blast) command for large sets of query sequences. The pb program splits a large search jobs into several subjobs, that can be executed simultaneously (more below).


The most commonly used BLAST commands are:

Search Commands

  • blastn search hits for a nucleotide sequence from nucleotide database
  • blastp search hits for a protein sequence from protein database
  • blastx search hits for a nucleotide sequence from protein database
  • psiblast do iterative search for a protein sequence from protein database
  • rpsblast search hits for a protein sequence from protein profile database
  • rpstblastn search hits for a nucleotide sequence from protein profile database
  • tblastn search hits for a protein sequence from nucleotide database
  • tblastx search hits for a nucleotide sequence from nucleotide database by using the protein translations of both query and database sequences.

Oter blast commands

  • blastdbcmd retrieve a sequence or a set of sequences form BLAST databases
  • makeblastdb create a new BLAST database
  • blast_formatter reformat a BLAST archive formatted BLAST result file


Usage


Ar CSC, BLAST searches can be executed in several ways:


Searches
To use the latest BLAST in Vuori, Hippu or  Murska, first give set up command (note the + sign in the end of the command):
module load blast+
After that you can start using the BLAST commands listed above. For example following command would search for sequence homologs form UniProt database for a protein sequence.
blastp -query proteinseq.fasta -db uniprot -out result.txt
You can use -help option to see, what command line options are available for a certain BLAST command. For example
blastp -help
For example command: 
blastp -query proteinseq.fasta -evalue 0.001 -db uniprot -outfmt 7 -out result.table
Would run the same search as decribed above, except that the  e-value threshold would be set to 0.001(-evalue 0.001) and the out put is printed out a a table (-outfmt 7).


Usage of pb (Parallel BLAST)  at CSC

If your query sequence set contains less than 20 sequences then Hippu is probable the most effective platforms for the search. However, if your query set contains hundreds or thousands of sequences then utilizing the vuori.csc.fi or murska.csc.fi cluster  is more  effective. For this kind of massive blast searches you can utilize the pb command.

In Vuori, Murska and Hippu , you can execute any BLAST search command through the pb command. pb (Parallel BLAST) is designed for situations, where the query file includes several sequences. It splits the query task into several subjobs, that can be run simultaneously using the resources of the server very effectively. For large sets of query sequences, pb can speed up the search up to 50 fold. Two sample pb commands for vuori.csc.fi and murska.csc.fi:

module load blast+

pb blastn -db embl_others -query 100_ests.fasta -out results.out

pb psiblast -db swiss -query protseqs.fasta -num_iterations 3 -out results.out


Using own BLAST databases with pb

The pb program also allows users to do BLAST searches against their own fasta formatted sequence sets. This is done by replacing the -db option with option -dbnuc (for nucleotides) or -dbprot (for proteins). Example: 

pb blastn -dbnuc my_seq_set.fasta -query querys.fasta -out results.out

Using genome data from ensembl with pb

pb command can also automatically retrieve a spesies spesific dataset from the Ensembl or Ensembl genomes servers and use the dataset as the search database. This is done by replacing the -db option with option -ensembl_dna (retrieves the genomic DNA),  -ensenmbl_cdna (retrieves the cDNA sequeces)  or -ensembl_prot (retrieves the protein sequences). The latin name of a species is given as an arguments for the ensembl options. You should use underscore (_) in sted of space in the species name.

For example to compare a set of nucleotide sequeces against the human genome, you could use a command like:
pb blastn -query dna_fargments.fasta -ensembl_dna homo_sapiens -out  human_hits.txt
To compare the same dna fragments against the protein sequences, predicted from the chicken genome, you could use command:
pb tblastn -query dna_fargments.fasta -ensembl_prot gallus_gallus -out  chicken_hits.txt
You can see the list of nearly 300 species, available at Ensembl and Ensembl genomes databases with command:
ensemblfetch -names



BLAST databases at CSC


Below is a list of BLAST databases maintained at the servers of CSC.

Name database source files
Nucleotides
arabidopsisN Aabidopsis thaliana nucleotide sequences ArabidopsisN.Z
embl_est EMBL EST division est files
embl_gss EMBL GSS division gss files
embl_htg EMBL HTG division htg files
embl_others EMBL excluding the EST, GSS and HTG divisions, and the WGS sets other files
emblnew EMBL updates cum files
nt NCBI non-redundant nucleotide database nt.gz
refseq NCBI RefSeq RNA database rna.fna.gz files
refseq_con NCBI RefSeq human contigs all chromosomes
Proteins
nr NCBI non-redundant protein database nr.gz
pdb PDB protein structure database pdb_seqres.txt
swiss Uniprot/Swiss database uniprot_sprot.fasta.gz
trembl Uniprot/TrEMBL database uniprot_trembl.fasta.gz
uniref100 Uniref100 database
uniref100.fasta.gz
uniref90 UniRef90 database
uniref90.fasta.gz
uniref50 UniRef50 database
uniref50.fasta.gz
Ensembl genomes



 select one of the species  with pb options: -ensembl_dna, -ensembl_cdna or -ensembl_pep
 ftp://ftp.ensembl.org/




Users own BLAST databases

CSC offers three ways to do BLAST queries against users own sequence sets

1. MyBLAST commands in Embster

Users can do BLAST searches against their own fasta formatted sequence sets with my_blast command in
Embster


2. pb BLAST and gb BLAST

In pb blast own databases are used by replacing the -db option with option -dbnuc (for nucleotides) or -dbprot (for proteins). Example: 
pb blastn -dbnuc my_seq_set.fasta -query querys.fasta -oout results.out


3. makeblastdb


You can use command makeblastdb to create a BLAST search database form you own asn1 or fasta formatted sequence set. If you use fasta formatted sequence files, please note that makeblastdb command assumes that the comment lines in the fasta file contain ncbi style names for the sequences ( e.g. gnl|db_name|sequece_id).

You can use EMBOSS command seqret to convert a normal fasta file to a ncbi formatted fasta file. For exmaple:

module load emboss
seqret my_seqs.fasta my_seqs_ncbi.fasta -osf ncbi
After this the BLAST database can be created with command:
makeblastdb -in my_seqs_ncbi.fasta -out my_seqs -parse_seqids
After this you can launch a search command. For example:
blastp -query proteinseq.fasta -db my_seqs -out result.txt

Note that if the BLAST database does not locate in the directory where the search command is executed, then the location of the database must be defined with environment variable BLASTDB

setenv BLASTDB /path/to/your/blastdatabse

Note that the -dbprot and -dbnuc options of the pb command (described above) do the database building operations automatically.


More information



Korpelainen Eija Eija.Korpelainen at csc.fi
Saren Ari-Matti Ari-Matti.Saren at csc.fi
Mattila Kimmo Kimmo.Mattila at csc.fi