Forests of Latent Tree Models to Decipher Genotype-Phenotype Associations - Biomedical Engineering Systems and Technologies - page 105

Biomedical Engineering Reference

In-Depth Information

For short, together with GRRs, the disease models allow specifying the probability

to be affected, depending on the genotype at the causal locus: GRR = P( affected|Aa )

P( affected|aa )

,

where A is the disease allele. The specification of the disease model amongst add , dom ,

mul and rec allows the adjustment of the probability to be affected when carrying the

two disease alleles AA , with respect to the probability to be affected when carrying Aa

(or aA ). Thus various effect sizes may be simulated (see Table 3).

Ta b l e 3 . The genotype relative risks for four standard disease models. The value 1 stands for

the effect when no disease allele ( A ) is present at the causal locus ( aa ). The effect sizes for the

carriers of one disease allele ( Aa or aA ) and two disease alleles ( AA ) are indicated for all four

disease models.

Genotype Relative Risk

Major Homozygotous Heterozygotous Minor Homozygotous

aa

aA or ( Aa )

AA

α

2

additive

1

1+

1+ α

dominant

1

1+ α

1+ α

1+ α 2

multiplicative

1

1+ α

recessive

1

1

1+ α

HAPGEN was run on the widely used reference haplotypes of the HapMap phase II

coming from U.S. residents of northern and western European ancestry (CEU) (http://

hapmap.ncbi. nlm.nih.gov/). The disease prevalence (percentage of cases observed in

a population) specified to HAPGEN was set to 0 . 01 , a standard value used for disease

locus simulation. The simulated data were generated for 1000 unaffected subjects and

1000 affected subjects and consist of unphased genotypes relative to a 1 . 5 Mb region

containing around 100 SNPs. Combining all previous conditions leads to testing 36

scenarii ( 3

4 ). To derive significant trends, each scenario was replicated 100

times. Together with the objective of a comprensive study, the necessity of replication

explains the choice of the number of variables ( 100 SNPs). Standard quality control for

genotypic data was carried out: SNPs with MAF less than 0 . 05 and SNPs deviant from

the so-called Hardy-Weinberg Equilibrium (not detailed) with a p-value below 0 . 001

were removed.

×

3

×

6.2 Choice of the Association Test

The G 2 standard test of independence was preferred over the well-known Chi 2 test.

For relatively small sample sizes (below 300 subjects) as in the real dataset analyzed in

SubSection 7.2, G 2 is recommended: G 2 =2 ij o ij ·

ln( o ij /e ij ) ,where o ij and e ij

are observed and expected frequencies (in absence of genotype-phenotype association)

in the cells of table genotypes

phenotypes . Various p-values were obtained through

successive tests of the phenotype Y against, respectively, the causal SNP, the causal

SNP ancestor nodes (A nodes) and other nodes (abbreviated as Os) in the FLTM's

graph. The phenotype Y is the affected/unaffected status.

×

6.3

Adapted Correction for Multiple Testing

To measure the significance of associations, it is necessary to adapt a permutation pro-

cedure dedicated to the computation of the per-test error rate α (type I error), in order

Next Page

Biomedical Engineering Systems and Technologies

Search WWH ::

Custom Search

Home