Forests of Latent Tree Models to Decipher Genotype-Phenotype Associations - Biomedical Engineering Systems and Technologies - page 99

Biomedical Engineering Reference

In-Depth Information

Ta b l e 2 . Representations based on latent tree models and their corresponding learning algorithms.

Scalability has been shown for Hwang et al. 's approach, to a certain extent, and for Mourad et

al. 's model, to genome-scale.

Method

Model

Specificity of Model Learning

Application

Reference

Two-level algorithm (adjustment of

optimal cardinalities conditional on

structure, structure optimization)

Multidimensional

clustering of

categorical data

Zhang (2004)

General LTM

Local search

Grow-restructure-thin strategy,

optimized EM (restricted likelihood)

Chen et al. 2011

Inference of latent structure

Zhang and Kocka (2004)

Cluster-based collection of latent

class models

SEM

LD modeling

Zhang and Ji (2009)

Binary tree-based LTM augmented

with possible connections between

sibling nodes, binary latent

variables

Large-scale data

analysis

Hwang et al. (2006)

Hierarchical

ascending

clustering of

variables

Variant with local exact computations

preceded by imputation of latent

variables

Inference of latent

structure

Binary trees

Harmeling and Williams (2011)

Approximation of the

underlying Bayesian

network through an

LT M

General arity of trees and latent

variables

Processing of an initial binary tree

Wang et al. (2008)

Partitioning of variables

LD modeling

Mourad et al. (2011)

From a methodological viewpoint, the recapitulation provided in table 2 puts the con-

tribution of Mourad and collaborators into the perspective of LTM-based approaches.

This table emphasizes the pioneering aspect of an LTM-based method dedicated to

genome-scale LD modeling. The next Section thoroughly describes the learning algo-

rithm for Mourad et al. 's approach.

4

The Learning Algorithm

Two versions of the FLTM learning algorithm were designed by the authors. This Sec-

tion depicts the version used to test the ability of the FLTM model to detect genetic as-

sociations. Five details of the algorithm are highlighted. The Section ends with a short

recapitulation regarding the evaluation of FLTM with respect to the LD modeling aspect.

4.1

Outline of the Algorithm

The learning is performed through an adapted agglomerative hierarchical clustering

procedure. At each iteration, a partitioning method is used to assign variables into

non-overlapping clusters. The partitioning is based on the identification of cliques of

strongly dependent variables in the complete graph of pairwise dependences. Amongst

the clusters, each cluster of size at least two is a candidate for subsumption into a latent

variable H . To acknowledge or reject the creation of H , a prior task considers the LCM

rooted in this latent variable candidate and whose leaves are all the variables of the

cluster. Parameter learning using the EM algorithm is performed for this LCM. Then

probabilistic inference allows missing data imputation for the latent variable. Once all

the data are known for this LCM, a validation step checks whether the latent variable

captures enough information from its children. If a latent variable is validated, its child

variables are then replaced with the latent variable. In contrast, the nodes in unvalidated

clusters are kept isolated for the next iteration. Iterating these steps yields a hierarchical

Next Page

Biomedical Engineering Systems and Technologies

Search WWH ::

Custom Search

Home