Forests of Latent Tree Models to Decipher Genotype-Phenotype Associations - Biomedical Engineering Systems and Technologies

Biomedical Engineering Reference

In-Depth Information

In the first category, the algorithms explore the search space through a local search

strategy and optimize a score, such as the BIC score [15]. Zhang proposed a greedy

algorithm, to navigate in the structure search space [16]. This algorithm is coupled

with a hill climbing procedure, to adjust the cardinality of the latent variables. A more

efficient variant of this algorithm has recently been proposed [17]. In this first cate-

gory, structural expectation maximization (SEM) has also been adapted to the case of

Bayesian networks with latent variables. SEM successively optimizes θ , conditionally

on the structure

conditionally on θ . Parameter learning being a

time-consuming step, in this framework, Zhang and Kocka have adapted a procedure,

called local EM, to optimize the variables whose connection or cardinality has been

modified in the transition from former to current model [18]. In one of the two LTM-

based approaches dedicated to LD modeling, Zhang and Ji use a set of LCMs and apply

a SEM strategy [1]. The number of LCMs has to be specified. To avoid getting trapped

in local optima while running the EM algorithm to learn a set of latent models, these

authors have adapted a simulated annealing approach.

The above score-based approaches require the computation of the maximum likeli-

hood in presence of latent variables, a prohibitive task regarding computational burden.

Thus, various methods based on the clustering of variables have been implemented.

They all construct the model following an ascending strategy; they all rely on the mu-

tual information (MI) criterion, to identify clusters of dependent variables. In their turn,

these methods may be sub-categorized into binary- and non binary-based approaches.

Hwang and co-workers' learning algorithm is dedicated to binary trees and binary

latent variables [19]. It has to be noted that the trees are possibly augmented with con-

nections between siblings, that is nodes sharing the same parent into the immediate

upper layer. Also confining themselves to binary trees, Harmeling and Williams have

proposed two learning algorithms [20]. One of them approximates MI between a latent

variable H and any other variable X , based on a linkage criterion (single, complete or

average) applied for X and the variables in the cluster subsumed by H .Avariantofthis

first algorithm locally infers the data corresponding to any latent variable; therefore it is

possible to achieve an exact computation of the MI criterion between a latent variable

and any other variable.

Two approaches have been proposed to circumvent binary tree-based structures.

Wang and co-workers first build a binary tree; then they apply regularization and simpli-

fication transformations which may result in subsuming more than two nodes through

a latent variable [21]. In their approach devoted to LD modeling, Mourad and colla-

borators implement the clustering of variables through a partitioning algorithm [3]; the

latter yields cliques of pairwise dependent variables. Besides, this method imposes the

control of information fading as the level increases in the hierarchy, which generally

results in the production of a forest of LTMs (instead of a single LTM).

Two of the above cited methods have been shown to be tractable. For these two non

binary-based approaches, some reports are available: Hwang and co-workers' approach

was able to handle 6000 variables and around 60 observations. The scalability of the

FLTM construction by Mourad et al. has been shown for benchmarks describing 10 5

variables and 2000 individuals.

S

, then optimizes

S

Biomedical Engineering Systems and Technologies

Search WWH ::

Custom Search

Home