Ensembl and Ensembl genomes are heavily used datasources in molecular biology. CSC does not maintain a complete copies of these databases. However, some applications, installed in the servers of CSC can contact and utilize the Ensembl database system at EBI. Users can also download and install Ensembl data for their own use to the servers of CSC.
Using Ensembl with EMBOSS
EMBOSS is capable of retrieving sequence data directly from the Ensembl mysql-server at EBI. A slightly modified USA definition is used in the case of Ensembl sequences. The format is ensembl:species:ID-code. You can use the Latin names to specify the species but for some species also the English names can be used. Some examples:
seqret ensembl:mouse:ENSMUST00000103109
infoseq ensembl:human:ENST00000262160
showseq ensembl:homo_sapiens:ENST00000262160
seqret 'ensemblgenomes:Escherichia coli K12:EBESCT00000004007'
seqret 'ensemblgenomes:Escherichia coli DH10B:EBESCT00000011809'
seqret 'ensemblgenomes:Schizosaccharomyces pombe:SPAC2F7.03c-1'
Retrieving fasta files from the Ensembl ftp site
Command ensemblfetch can be used to retrieve fasta formatted sequence data from the Ensembl ftp site for a given species. The species should be defined by using the Latin name where the space character is replaced with underscore (_) e.g. homo_sapiens . By default the command retrieves genomic DNA (-type dna ) but you can also retrieve transcripts (-type cdna ) or protein sequences (-type pep ). For example:
ensemblfetch -type cdna bos_taurus
List of all available species names can be retrieved with command:
ensemblfetch -names
Running BLAST searches against Ensembl data
The pb script associated to the latest BLAST+ can automatically fetch species specific data set from the Ensembl database to be used as the query database. Use options -ensembl_dna species_name to search against genomic DNA, -ensembl_cdna species_name for cDNA transcripts and -ensembl_prot species_name for the species specific protein sequence set. For example:
module load blast+
pb blastn -query queryseq.fasta -ensembl_cdna gallus_gallus -out results.txt
Installing Ensembl database to the MySQL server of CSC
You can install your own copy of a specific Ensembl database to the kaivos.csc.fi database server at CSC. However you must first apply for a new database and database account. For more information about getting access to the database server of CSC, please check chapter 7 in the CSC Data services guide. (http://www.csc.fi/english/pages/data-services/databases/index_html)