Sciences and methods > Biosciences > Programs > MERLIN > Example of MERLIN input files
 
Tehdyt toimenpiteet

Example of MERLIN input files

The input data for MERLIN can be given in QTDT format (in this example) or in LINKAGE format.

The example pedigree consists of 19 individuals in 3 generations:

Pedigree file

The pedigree structure and the genotypes are specified in a pedigree file:

1  1  0  0 1 1   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 2 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 3 1 2 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 4 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 5 1 2 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 6 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 7 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 8 1 2 2 2 6 4 5 6 5 2 0 0 1 1 3 4 2 6 5 1
1 9 3 4 1 2 4 4 6 1 2 2 3 5 1 7 4 2 6 6 4 1
1 10 3 4 2 1 6 4 0 0 5 2 5 5 1 7 3 2 2 6 5 1
1 11 5 6 1 1 6 4 0 0 0 0 0 0 1 3 0 0 2 3 5 4
1 12 5 6 2 1 6 5 5 5 5 5 5 4 1 7 3 3 2 6 5 4
1 13 5 6 1 2 4 5 6 5 2 5 0 0 1 7 4 3 6 6 1 4
1 14 7 8 1 2 1 4 1 6 1 2 3 3 7 1 3 4 3 6 5 1
1 15 7 8 2 2 1 4 1 6 6 2 3 3 3 1 3 4 3 6 2 1
1 16 0 0 1 1 6 2 5 1 1 3 5 4 2 7 2 3 6 6 4 2
1 17 16 12 1 2 2 6 0 0 3 5 0 0 7 1 3 3 6 2 2 5
1 18 16 12 2 1 6 5 5 5 1 5 5 4 2 7 2 3 6 6 4 4
1 19 16 12 2 1 2 6 1 5 3 5 4 5 7 1 3 3 6 2 0 0
The interpretation of the first five columns is fixed:
1. Pedigree identifier
2. Person identifier
3. Identifier of then individual's father
4. Identifier of then individual's mother
5.   
Sex (1=male, 2=female)

The remaining columns contain the phenotype and genotype data. In this example they are interpreted as:

6. Affection status (1=healthy, 2=affected)
7.-n. Genotypes, 2 integers per marker

Columns for quantitative values (trait and covariates) can also be included after the five fixed columns. Missing marker genotypes are indicated by '0/0'. Missing trait and covariate values are coded as 'X'.

Locus file

The phenotype and genotype loci are presented in the locus file:

A Disease
M M1
M M2
M M3
M M4
M M5
M M6
M M7
M M8

In the first column, the locus type is specified (A=affection status, M=marker, T=quantitative trait, C=covariate). The second columns defines the locus name. Notice that a marker locus always corresponds to two consequtive columns in the pedigree file.

Map file

1 M1 0.0
1 M2 5.0
1 M3 10.0
1 M4 15.0
1 M5 20.0
1 M6 25.0
1 M7 30.0
1 M8 35.0

The first column specifies the chromosome that the marker belongs to. The second column contains the marker name and the thid column contains the genetic location of the marker (in cM).

Allele frequency file

Allele frequencies can be estimated from the data. Alternatively, it is possible to define the population-level frequencies in a specific allele frequency file. The format of the frequency file (only extending for markers 1 and 2) would be:

M M1
A 1 0.047
A 2 0.312
A 3 0.128
A 4 0.016
A 5 0.250
A 6 0.247
M M2
A 1 0.142
A 2 0.222
A 3 0.021
A 4 0.216
A 5 0.159
A 6 0.240
(to be continued to cover all the markers in the map)

There exists an alternative format for the allele frequency file. See the Merlin documentation.

Note: if a parameter file in LINKAGE format is given, then allele frequencies are extracted from that file, and they are neither read from a frequency file nor estimated.