Biomedical Engineering Reference
In-Depth Information
variables erroneously associated with the disease. Therefore, realistic simulated data
designed to harbour a causal SNP must be generated. We name indirect genetic asso-
ciation any dependence between a causal SNP ancestor node (abbreviated as A) in the
FLTM and the disease. This dependence is due to the fact that an A node is likely to
capture the information of the causal SNP. If indirect genetic association may be ev-
idenced for A nodes, the identification of A nodes will allow pointing out potentially
causal markers since the latter are leaf nodes of the trees rooted in A nodes (see Fig-
ure 2 which clarifies the meaning of specific key terms further used). The purpose is
to examine the difference between causal SNP ancestors (As) and other latent nodes
(abbreviated as Os). The behaviour of Os in causal trees (OTs) and of Os outside causal
trees (OOs) will also be examined.
6.1
Simulation of Realistic Genotypic Data
Conducting a systematic analysis under controlled conditions requires that we are able
to simulate both realistic SNP data and an association between one of these SNPs
and the disease status (affected/unaffected). For this purpose, one of the most widely
used software applications was chosen, namely HAPGEN (http://www.stats.ox.ac.uk/
marchini/software/gwas/hapgen.html) [24]. The reader well acquainted with such
HAPGEN simulations may skip the two following paragraphs, which describe the sim-
ulation in the case of a single causal SNP.
Generating realistic genotypic simulation lies in the ability to mimic linkage disequi-
librium. HAPGEN relies on the haplotypes (or sequence of alleles) of a population of
reference, to generate new haplotypes as mosaics of the known haplotypes, for a user-
specified number of cases and controls. The genotype of any individual is generated
based on the two haplotypes simulated for this individual.
HAPGEN selects at random the causal SNP, checking for the minor allele frequency
to be within a user-specified range. Assuming causality under a specific disease model
and effect sizes, it is straightforward to calculate the genotype frequencies in cases at
that locus. On this basis, any case individual is simulated by first simulating the alleles at
the causal locus and then working outwards in each direction to construct the two haplo-
types. Note that the same mechanism governs the construction of haplotypes, whatever
the status of the individual (case or control). The only distinction lies in that the locus
from which the extension is started is chosen at random, for controls. For cases, the
extension is initiated from the causal locus. The extension processes conditionally on
reference haplotypes and is ruled by the fine-scale knowledge of recombination rates
and the physical distance between loci, to calculate the probability of breaks in the
mosaic pattern as one moves along the region. Moreover, partial copies (of haplotype
subregions) are blurred by simulated mutations.
To control the simulation conditions, three ingredients are combined: minor allele
frequency (MAF) of the causal SNP, severity of the disease expressed as genotype rela-
tive risks (GRRs) for various disease models . The range of the MAF at the causal SNP
is specified to be 0.1-0.2, 0.2-0.3 or 0.3-0.4. Various genotype relative risks are con-
sidered and the disease model is specified amongst additive, dominant, multiplicative
or recessive ( add , dom , mul , rec ). These choices are justified as standards used for
simulations in association genetics.
 
Search WWH ::




Custom Search