Courses and events > Course and event archive > Bioinformatics with large data-sets and the database service of CSC
 
Sisältö
More information

about CSC customer training:

courses at csc.fi.

Follow us also on Twitter!

Follow us also on Twitter!
 
Tehdyt toimenpiteet

Bioinformatics with large data-sets and the database service of CSC

Modern research methods in biosciences can rapidly produce very large data-sets. While the actual analysis methods typically do not change, the handling of large data-sets gives rise to new challenges and scalability issues. While it is for example quite feasible to do BLAST analysis for tens of thousands of sequences, it is not feasible to do it by cutting-and-pasting them one by one  into a web form.

The CSC computing resources are well suited to performing such large scale analysis. There can, however, be huge difference in the time and resources required depending how the analysis is set up.

This course will offer an introduction on how to best utilize the CSC computing environment to move, store and analyze large data-sets.

The duration of the course is two days. The first day concentrates working in the Unix environment of CSC. The second day is for the database service of CSC. It introduces the MySQL database server linked to supercomputer environment at CSC. It is possible to register only for one day if convenient (1st day or 2nd day) or for both days.

Examples used on the course will be mostly from sequence analysis, but the methods presented are easily adapted to any kind of computational analysis.

The course is intended as on introduction to the subject. No previous familiarity with large scale computation is required, but basic familiarity with command line based systems (such as the CSC application servers) and knowledge of relational databases is helpful.


Topics we will touch on this course include:

Day 1: Bioinformatics with large data-sets

  • How to move large data-sets between CSC and your own system
  • How and where to store the data at CSC
  • How to best utilize the CSC computing resources for your data
    • Which machine is best suited to your job
    • What are your options for running your job
    • How to run batch jobs and array jobs
    • How to optimize your batch scripts
  • Automatizing analysis
    • Introduction to shell scripting

Day 2: The database service of CSC
  • Introduction of database service
  • User interfaces of databases
  • Data import and export 
  • Using MySQL client through batch job system
  • Using the database service directly from your local computer

Program

Day 1. Bioinformatics with large data-sets

9.00  Registration and coffee

9.15 Course starts

12.00-12.30 Lunch

14.00 Coffee

17.00 Course ends

Day 2. The database service of CSC

9.00  Registration and coffee

9.15 Course starts

12.00-12.30 Lunch

14.00 Coffee

17.00 Course ends

Materials

Date: 02.12.2009 09:00 - 03.12.2009 17:00
Location: Premises of CSC, Keilaranta 14, Keilaniemi, Espoo.
Language: English
Lecturers: Ari-Matti Saren, Kimmo Mattila
Price:
  • 70 euros + VAT (23%) for Finnish academics
  • 210 euros + VAT (23%) for others
The fee includes course materials and morning and afternoon coffee.

Registration

Registration has expired 27.11.2009 16:00
There are 24 seats on the course, and seats are allocated on "first come, first served" basis. Course will be arranged if there are at least 8 registrations. A confirmation email will be sent to the participants about one week before the course. Participants can cancel their registration at latest three business days before the course without extra costs. Cancellation after that is possible, but the whole course fee will be charged. Participants will get a course certificate after attending the course. Bills will be sent to the participants after the course by mail (not by email) or as an electronic bill.

Additional information

Ari-Matti Saren (09-457 2282 at Ari-Matti.Saren [at] csc.fi) or Kimmo Mattila (09-457 2708 at kimmo.mattila [at] csc.fi)


COURSE FEEDBACK FORM