Biomedical Engineering Reference
In-Depth Information
Fig. 6. Boxplot of −log 10 (p-value) values for the different layers of the FLTM, resulting from
association tests of the phenotype with the causal SNP ancestor nodes (As) or with the causal SNP
non-ancestor nodes (Os) - real data. Layer 0 refers to the association test between the phenotype
and the causal SNP (marker 19 ). In layer 3 , no O nodes are observed in the FLTMs.
To take into account the stochastic nature of the algorithm (random initialization of
parameters during the EM algorithm), 1000 runs were performed. Each run takes on
average 5 . 4 s on a standard PC computer ( 3 GHz , 2 GB RAM). On average, over
all 1000 FLTMs ( 1000 replicates), the percentages of nodes are distributed as follows:
82 . 62% in layer 0 , 16 . 89% in layer 1 , 0 . 39% in layer 2 and 0 . 10% in layer 3 . Figure 6
shows the
log 10 (p-value) values of association tests relative to As and Os. As expected
in view of experiments led on simulated data, the A nodes succeed in capturing indirect
association, in particular in layer 1 , with a median value of 5 . 5 , corresponding to p-
values lower than 5 . 10 − 6 . In the other layers, the strength of associations is lower but
remains relatively high as in layer 2 showing a median value of 4 , equivalent to a p-
value of 10 − 4 . As previously seen, when we focus on O nodes, we observe very few
strong associations. The majority of p-values (over 80% ) is greater than 0 . 01 .
−
8
Conclusions
Based on both simulated and real data analyses, this chapter promotes the use of FLTMs
as a simple and useful framework for disease association detection in human genetics.
Efficient capture of indirect genetic association is achieved through two major reasons:
(i) the causal SNP ancestor nodes succeed in capturing indirect associations with the
phenotype; (ii) at the opposite, the other latent nodes globally show very weak associa-
tions. In other words, this property allows to distinguish between true and false indirect
genetic associations.
The numbers of SNPs in the benchmarks used for the simulations were limited.
Nonetheless, this limitation is not a bias to the sound characterization of the fading
of information in the FLTM hierarchies: bottom-up information decays does concern
the forest depth and does not interfere with the forest width. It must be underlined that
the tests were not designed to meet the small n ,large p condition (many more variables
(SNPs) than subjects) as in genome-wide association studies (GWASs). Again, this is
not a bias to the study: over thirty-six various scenarii, it was shown that the overwhelm-
ing part (about three quarters) of false positives confines in a unique tree, namely the
one harbouring the causal SNP (causal tree). In the conditions of a GWAS, the forest
width may well be far larger than those observed in our tests, the false positives are
expected to remain confined in the causal tree, for the major part.
 
Search WWH ::




Custom Search