Biomedical Engineering Reference
In-Depth Information
Ta b l e 2 . Representations based on latent tree models and their corresponding learning algorithms.
Scalability has been shown for Hwang et al. 's approach, to a certain extent, and for Mourad et
al. 's model, to genome-scale.
Method
Model
Specificity of Model Learning
Application
Reference
Two-level algorithm (adjustment of
optimal cardinalities conditional on
structure, structure optimization)
Multidimensional
clustering of
categorical data
Zhang (2004)
General LTM
Local search
Grow-restructure-thin strategy,
optimized EM (restricted likelihood)
Chen et al. 2011
Inference of latent structure
Zhang and Kocka (2004)
Cluster-based collection of latent
class models
SEM
LD modeling
Zhang and Ji (2009)
Binary tree-based LTM augmented
with possible connections between
sibling nodes, binary latent
variables
Large-scale data
analysis
Hwang et al. (2006)
Hierarchical
ascending
clustering of
variables
Variant with local exact computations
preceded by imputation of latent
variables
Inference of latent
structure
Binary trees
Harmeling and Williams (2011)
Approximation of the
underlying Bayesian
network through an
LT M
General arity of trees and latent
variables
Processing of an initial binary tree
Wang et al. (2008)
Partitioning of variables
LD modeling
Mourad et al. (2011)
From a methodological viewpoint, the recapitulation provided in table 2 puts the con-
tribution of Mourad and collaborators into the perspective of LTM-based approaches.
This table emphasizes the pioneering aspect of an LTM-based method dedicated to
genome-scale LD modeling. The next Section thoroughly describes the learning algo-
rithm for Mourad et al. 's approach.
4
The Learning Algorithm
Two versions of the FLTM learning algorithm were designed by the authors. This Sec-
tion depicts the version used to test the ability of the FLTM model to detect genetic as-
sociations. Five details of the algorithm are highlighted. The Section ends with a short
recapitulation regarding the evaluation of FLTM with respect to the LD modeling aspect.
4.1
Outline of the Algorithm
The learning is performed through an adapted agglomerative hierarchical clustering
procedure. At each iteration, a partitioning method is used to assign variables into
non-overlapping clusters. The partitioning is based on the identification of cliques of
strongly dependent variables in the complete graph of pairwise dependences. Amongst
the clusters, each cluster of size at least two is a candidate for subsumption into a latent
variable H . To acknowledge or reject the creation of H , a prior task considers the LCM
rooted in this latent variable candidate and whose leaves are all the variables of the
cluster. Parameter learning using the EM algorithm is performed for this LCM. Then
probabilistic inference allows missing data imputation for the latent variable. Once all
the data are known for this LCM, a validation step checks whether the latent variable
captures enough information from its children. If a latent variable is validated, its child
variables are then replaced with the latent variable. In contrast, the nodes in unvalidated
clusters are kept isolated for the next iteration. Iterating these steps yields a hierarchical
 
Search WWH ::




Custom Search