Sciences and methods > Biosciences > Databases > Ensembl
 
Tehdyt toimenpiteet

Ensembl at CSC

Ensembl and Ensembl genomes are heavily used datasources in molecular biology. CSC does not maintain a complete copies of these databases. However, some applications, installed in the servers of CSC can contact and utilize the Ensembl database system at EBI. Users can also download and install Ensembl data for their own use to the servers of CSC.


Using Ensembl with EMBOSS

EMBOSS is capable of retrieving sequence data directly from the Ensembl mysql-server at EBI. A slightly modified USA definition is used in the case of Ensembl sequences. The format is ensembl:species:ID-code. You can use the Latin names to specify the species but for some species also the English names can be used. Some examples:

seqret ensembl:mouse:ENSMUST00000103109

infoseq ensembl:human:ENST00000262160

showseq ensembl:homo_sapiens:ENST00000262160

seqret 'ensemblgenomes:Escherichia coli K12:EBESCT00000004007'

seqret 'ensemblgenomes:Escherichia coli DH10B:EBESCT00000011809'

seqret 'ensemblgenomes:Schizosaccharomyces pombe:SPAC2F7.03c-1'


Retrieving fasta files from the Ensembl ftp site

Command ensemblfetch can be used to retrieve fasta formatted sequence data from the Ensembl ftp site for a given species. The species should be defined by using the Latin name where the space character is replaced with underscore (_) e.g. homo_sapiens . By default the command retrieves genomic DNA (-type dna ) but you can also retrieve transcripts (-type cdna ) or protein sequences (-type pep ). For example:

  ensemblfetch -type cdna bos_taurus

List of all available species names can be retrieved with command:

 ensemblfetch -names


Running BLAST searches against Ensembl data

The pb script associated to the latest BLAST+ can automatically fetch species specific data set from the Ensembl database to be used as the query database. Use options -ensembl_dna species_name to search against genomic DNA, -ensembl_cdna species_name for cDNA transcripts and -ensembl_prot species_name for the species specific protein sequence set. For example:

module load blast+
pb blastn -query queryseq.fasta -ensembl_cdna gallus_gallus -out results.txt


Installing Ensembl database to the MySQL server of CSC

You can install your own copy of a specific Ensembl database to the kaivos.csc.fi database server at CSC. However you must first apply for a new database and database account. For more information about getting access to the database server of CSC, please check chapter 7 in the CSC Data services guide. (http://www.csc.fi/english/pages/data-services/databases/index_html)