Version
-
NCBI BLAST version 2.2.22 is available in murska.csc.fi
- NCBI BLAST version 2.2.22 is available in hippu.csc.fi.
- A WWW interface to BLAST is available in the Scientist's Interface.
The older BLAST versions are still available. Instructions for these commands can be found from link bellow
Description
BLAST (Basic Local Alignment Search Tool) is the most frequently used sequence homology search tool. Given a probe sequence (nucleotide or protein), BLAST compares it to a sequence database and picks out sequences with significant similarity to the probe sequence. BLAST uses a heuristic search protocol, which makes search very fast compared to non-heuristic methods. The heuristics used may however cause BLAST to fail to find all significant hits.
The command line version of NCBI-BLAST allows a user to modify all parameters of BLAST, to use special methods like PSI-BLAST and PHI-BLAST, and to analyze large data sets.
In Murska you can use pb (Parallel Blast) command for large sets of query sequences. The pb program splits a large search jobs into severall subjobs, that can be executed simultaneously (more below).
The most commonly used BLAST commands are:
Search Commands
- blastn search hits for a nucleotide sequence from nucleotide database
- blastp search hits for a protein sequence from protein database
- blastx search hits for a nucleotide sequence from protein database
- psiblast do iterative search for a protein sequence from protein database
- rpsblast search hits for a protein sequence from protein profile database
- rpstblastn search hits for a nucleotide sequence from protein profile database
- tblastn search hits for a protein sequence from nucleotide database
- tblastx search hits for a nucleotide sequence from nucleotide database by using the protein translations of both query and database sequences.
Database commands
- blastdbcmd retrieve a sequence or a set of sequences form BLAST databases
- makeblastdb create a new BLAST database
Usage
Searches
To use the latest BLAST in Hippu or in Murska, first give set up command (note the + sign in the end of the command):
module load blast+After that you can start using the BLAST commands listed above. For example following command would search for sequence homologs form UniProt database for a protein sequence.
blastp -query proteinseq.fasta -db uniprot -out result.txtYou can use -help option to see, what command line options are available for a certain BLAST command. For example
blastp -helpFor example command:
blastp -query proteinseq.fasta -evalue 0.001 -db uniprot -outfmt 7 -out result.tableWould run the same search as decribed above, except that the e-value threshold would be set to 0.001(-evalue 0.001) and the out put is printed out a a table (-outfmt 7).
Usage of pb (Parallel BLAST) in Murska and Hippu
In Murska and Hippu , you can execute any BLAST search command through the pb
command. pb (Parallel BLAST) is designed for situations, where the
query file includes several sequences.
In murska.csc.fi It splits the query task into several subjobs, that can be run
simultaneously using the resources of the server very
effectively. For large sets of query sequences, pb can speed up the
search up to 50 fold. Two sample pb commands for murska.csc.fi:
module load blast+
pb blastn -db embl -query 100_ests.fasta -out results.out
pb psiblast -db swiss -query protseqs.fasta -num_iterations 3 -out results.out
The pb program also allows users to do BLAST searches against their own fasta formatted sequence sets.
This is done by replacing the -db option with option -dbnuc (for nucleotides) or -dbprot
(for proteins). Example:
pb blastn -dbnuc my_seq_set.fasta -query querys.fasta -out results.out
BLAST databases in Murska and Hippu
Bellow is a list of BLAST databases maintained in the Hippu and Murska servers.
| Name | database | source files |
|---|---|---|
| Nucleotides | ||
| arabidopsisN | Aabidopsis thaliana nucleotide sequences | ArabidopsisN.Z |
| embl_est | EMBL EST division | est files |
| embl_gss | EMBL GSS division | gss files |
| embl_htg | EMBL HTG division | htg files |
| embl_others | EMBL excluding the EST, GSS and HTG divisions, and the WGS sets | other files |
| emblnew | EMBL updates | cum files |
| nt | NCBI non-redundant nucleotide database | nt.gz |
| refseq | NCBI RefSeq RNA database | rna.fna.gz files |
| refseq_con | NCBI RefSeq human contigs | all chromosomes |
| Proteins | ||
| nr | NCBI non-redundant protein database | nr.gz |
| pdb | PDB protein structure database | pdb_seqres.txt |
| swiss | Uniprot/Swiss database | uniprot_sprot.fasta.gz |
| trembl | Uniprot/TrEMBL database | uniprot_trembl.fasta.gz |
| uniref100 | Uniref100 database |
uniref100.fasta.gz |
| uniref90 | UniRef90 database |
uniref90.fasta.gz |
| uniref50 | UniRef50 database |
uniref50.fasta.gz |
Users own BLAST databases
CSC offers three ways to do BLAST queries against users own sequence sets1. MyBLAST www intrface
Users can do BLAST searches against their own fasta formatted sequence sets with My BLAST www interface:2. pb BLAST
In pb blast own databases are used by replacing the -db option with option -dbnuc (for nucleotides) or -dbprot (for proteins). Example:pb blastn -dbnuc my_seq_set.fasta -query querys.fasta -oout results.out
3. makeblastdb
You can use command makeblastdb to create a BLAST search database form you own asn1 or fasta formatted sequence set. If you use fasta formatted sequence files, please note that makeblastdb command assumes that the comment lines in the fasta file contain ncbi style names for the sequences ( e.g. gnl|db_name|sequece_id).
You can use EMBOSS command seqret to convert a normal fasta file to a ncbi formatted fasta file. For exmaple:
module load embossAfter this the BLAST database can be created with command:
seqret my_seqs.fasta my_seqs_ncbi.fasta -osf ncbi
makeblastdb -in my_seqs_ncbi.fasta -out my_seqs -parse_seqidAfter this you can launch a search command. For example:
blastp -query proteinseq.fasta -db my_seqs -out result.txt
Note that if the BLAST database does not locate in the directory where the search command is executed, then the location of the database must be defined with environment variable BLASTDB
setenv BLASTDB /path/to/your/blastdatabse
Note that the -dbprot and -dbnuc options of the pb command (described above) do the database building operations automatically.
More information
| Korpelainen Eija | eija.korpelainen at csc.fi |
| Saren Ari-Matti | ari-matti.saren at csc.fi |
| Mattila Kimmo | kimmo.mattila at csc.fi |