Biomedical Engineering Reference
In-Depth Information
Forests of Latent Tree Models to Decipher
Genotype-Phenotype Associations
Christine Sinoquet 1 , , Raphael Mourad 2 , and Philippe Leray 3
1 LINA, UMR CNRS 6241, Universite de Nantes, 2 rue de la Houssiniere, BP 92208, 44322
Nantes Cedex, France
2 Center for Computational Biology and Bioinformatics, Department of Molecular and Medical
Genetics, Indiana University, Indianapolis, IN, 46002, U.S.A.
3 LINA, UMR CNRS 6241, Ecole Polytechnique de l'Universit´edeNantes,
rue Christian Pauc, BP 50609, 44306 Nantes Cedex 3, France
{ christine.sinoquet,philippe.leray } @univ-nantes.fr
Abstract. Genome-wide association studies have revolutionized the search for
genetic influences on common genetic diseases such as diabetes, obesity, asthma,
cardio-vascular diseases and some cancers. In particular, together with the pop-
ulation aging concern, increasing health care costs require that further investiga-
tions are pursued to design scalable and efficient tools. The high dimensionality
and complexity of genetic data hinder the detection of genetic associations. To
decrease the risks of missing the causal factor and discovering spurious asso-
ciations, machine learning offers an attractive framework alternative to classical
statistical approaches. A novel class of probabilistic graphical models (PGMs)
has recently been proposed - the forest of latent tree models (FLTMs) - , to reach
a trade-off between faithful modeling of data dependences and tractability. In
this chapter, we assess the great potentiality of this model to detect genotype-
phenotype associations. The FLTM-based contribution is first put into the per-
spective of PGM-based works meant to model the dependences in genetic data;
then the contribution is considered from the technical viewpoint of LTM learn-
ing, with the vital objective of scalability in mind. We then present the systematic
and comprehensive evaluation conducted to assess the ability of the FLTM model
to detect genetic associations through latent variables. Realistic simulations were
performed under various controlled conditions. In this context, we present a pro-
cedure tailored to correct for multiple testing. We also show and discuss results
obtained on real data. Beside guaranteeing data dimension reduction through la-
tent variables, the FLTM model is empirically proven able to capture indirect
genetic associations with the disease: strong associations are evidenced between
the disease and the ancestor nodes of the causal genetic marker node, in the forest;
in contrast, very weak associations are obtained for other latent variables. Finally,
we discuss the prospects of the model for association detection at genome scale.
Keywords: Probabilistic Graphical Model, Bayesian Network, Latent Tree Model,
Detection of Genetic Association, Latent Variable, Data Dimension Reduction,
Scalability.
Christine Sinoquet and Raphael Mourad are joint first authors.
Search WWH ::




Custom Search