Bioinformatics with large data-sets and the database service of CSC
Modern research methods in biosciences can rapidly produce very
large data-sets. While the actual analysis methods typically do not
change, the handling of large data-sets gives rise to new challenges
and scalability issues. While it is for example quite feasible to do
BLAST analysis for tens of thousands of sequences, it is not feasible
to do it by cutting-and-pasting them one by one into a web form.
The CSC computing resources are well suited to performing such large scale analysis. There can, however, be huge difference in the time and resources required depending how the analysis is set up.
This course will offer an introduction on how to best utilize the
CSC computing environment to move, store and analyze large data-sets.
The duration of the course is two days. The first day concentrates
working in the Unix environment of CSC. The second day is for the
database service of CSC. It introduces the MySQL database server linked
to supercomputer environment at CSC. It is possible to register only for one day if convenient (1st day or 2nd day) or for both days.
Examples used on the course will be mostly from sequence analysis, but the methods presented are easily adapted to any kind of computational analysis.
The course is intended as on introduction to the subject. No previous familiarity with large scale computation is required, but basic familiarity with command line based systems (such as the CSC application servers) and knowledge of relational databases is helpful.
Topics we will touch on this course include:
Day 1: Bioinformatics with large data-sets
- How to move large data-sets between CSC and your own system
- How and where to store the data at CSC
- How to best utilize the CSC computing resources for your data
- Which machine is best suited to your job
- What are your options for running your job
- How to run batch jobs and array jobs
- How to optimize your batch scripts
- Automatizing analysis
- Introduction to shell scripting
Day 2: The database service of CSC
- Introduction of database service
- User interfaces of databases
- Data import and export
- Using MySQL client through batch job system
- Using the database service directly from your local computer
Program
Day 1. Bioinformatics with large data-sets
9.00 Registration and coffee
9.15 Course starts
12.00-12.30 Lunch
14.00 Coffee
17.00 Course ends
Day 2. The database service of CSC
9.00 Registration and coffee
9.15 Course starts
12.00-12.30 Lunch
14.00 Coffee
17.00 Course endsMaterials
- MySQL Introduction
- Using MySQL in kaivos.csc.fi
- MySQL exercises
- Bioinformatics with large data-sets exercises, day 1
- Bioinformatics with large data-sets lectures, day 1
- 70 euros + VAT (23%) for Finnish academics
- 210 euros + VAT (23%) for others
Registration
Additional information
Ari-Matti Saren (09-457 2282 at Ari-Matti.Saren [at] csc.fi) or Kimmo Mattila (09-457 2708 at kimmo.mattila [at] csc.fi)
