High-performance computing assists in height research
22.02.2008
Human growth and attained height are typical multi-factorial traits, determined not only by genetic factors but also by environmental factors such as nutrition during adolescence and conditions during pregnancy. Genetic factors have a significant impact on stature; scientists estimate that they explain more than 80% of the observed variation in height. According to the National Public Health Institute research group directed by Markus Perola and Leena Peltonen-Palotie, the genes we have inherited from our parents determine our potential maximum height; whether we attain this height depends on whether the environmental factors are favorable with regard to growth.
The genetic background of multi-factorial traits, such as height, is highly complex. It is likely that there are dozens of genes with small individual effects – in the case of height, only a few millimeters – underlying the genetic architecture. Establishing such genetic influence requires extensive research data, the statistical use of which in turn requires enhanced computing capacity.
The research group analyzed the genome of more than 6,600 European twins; in total, the research involved more than 10,000 twins of European origin and their siblings. The research data was collected by GenomEUtwin consortium, an international research effort financed by the EU and coordinated by the National Public Health Institute. The population cohorts used consisted of Australian, Danish, Dutch, English, Finnish and Swedish families containing twin pairs. The research effort, which constitutes the most extensive published molecular genetic twin study to date, is led by Professor Leena Palotie. Results from the study were published in July 2007 in PloS Genetics, an internationally recognized publication series in the field of genetics.
Genes affecting stature were located by means of genetic linkage analysis using genetic markers known to show variation between individuals. The markers, which are evenly distributed across the entire genome, were determined from DNA samples. By using these markers it is possible to determine, by means of the linkage analysis, the chromosomal regions that are similar in family members showing a strong correlation in height. As a result, scientists can estimate the statistical probability of the region containing genes associated with adult height.
However, the statistical significance of the linkage evidence must be determined by means of simulation, which necessitates high-performance computing. In practice, this involves creating a large number of computer-generated random cohorts. The virtual cohorts thus created resemble the original population-based cohorts. They also correspond to the zero hypothesis situation of the linkage analysis in which none of the genetic markers is linked to the height genes.
The research team generated 100 virtual cohorts which were analyzed in exactly the same way as the population-based cohorts. As a result, the research team obtained an observation-based distribution for genetic linkage indicator values that would be observed in a purely random test arrangement. The indicator values obtained from the population-based cohorts were compared with the obtained distribution, thus obtaining an estimate of the statistical significance of the linkage results. This process – evaluating the empirical statistical significance – naturally increased the required computing time by one hundred-fold. In this study, only one chromosomal region, at chromosome 8, was found to show evidence of significant linkage with height in the European population.
The virtual cohorts were generated and analyzed using CSC’s supercomputers Corona and Louhi. They are extremely well suited for this purpose, since in a genome-wide linkage analysis each virtual cohort and chromosome may be computed in parallel using individual CPUs, thus achieving a significant reduction in the actual computing time required. Parallel computing is made possible by the fact that the cohorts and chromosomes are mutually independent and can thus be analyzed separately; the results can be combined once the analyses have been completed. If one CPU were available for each chromosome in each virtual cohort, their analysis would require the same amount of time as the analysis of the population-based cohort. From the perspective of the busy gene mapper this would naturally be an ideal solution.
Publication:
Perola M, Sammalisto S, Hiekkalinna T, Martin NG, Visscher PM, Montgomery GW, Benyamin B, Harris JR, Boomsma D, Willemsen G, Hottenga JJ, Christensen K, Kyvik KO, Sørensen TI, Pedersen NL, Magnusson PK, Spector TD, Widen E, Silventoinen K, Kaprio J, Palotie A, Peltonen L; GenomEUtwin Project. Combined genome scans for body stature in 6,602 European twins: evidence for common Caucasian loci. PLoS Genet. 2007 Jun;3(6):e97. Epub 2007 May 2.
Further information:
Markus Perola, Docent, National Public Health Institute, Department of Molecular Medicine. markus.perola at ktl.fi
Leena Peltonen-Palotie, Professor, National Public Health Institute, Department of Molecular Medicine. leena.palotie at ktl.fi
CSC
CSC, the Finnish IT center for science, is administered by the Ministry of Education. CSC is a non-profit company providing IT support and resources for academia, research institutes and companies: modeling, computing and information services. CSC provides Finland’s widest selection of scientific software and databases and Finland’s most powerful supercomputing environment that researchers can use via the Funet network.