Forests of Latent Tree Models to Decipher Genotype-Phenotype Associations - Biomedical Engineering Systems and Technologies

Biomedical Engineering Reference

In-Depth Information

of the physical constraint δ . The physical constraint imposed by the sliding window size

δ allows to adjust the variable bandwidth of the sparse dependence matrix.

It has to be noted that, unlike SNPs, latent variables are not characterized by a phys-

ical location on the chromosome. In this specific case, the locations of the SNPs sub-

sumed by a given latent variable are averaged to provide the location of this latent

variable.

Data Imputation for Latent Variables. Data imputation is processed locally, that is

considering the LCM rooted in the latent variable and whose leaves are the variables

in the cluster. For simplification, the cardinality of the latent variable is estimated as

an affine function of the number of leaves. Parameter learning is first performed in

this LCM, through the EM algorithm. This step yields the marginal distribution of the

latent variable and the conditional distributions of the child variables. Therefore, (linear)

probabilistic inference can be carried on, based on the following principle:

Π i =1 P

( x i |

H = c )

( H = c )

x j )=

( H = c

c =1 Π i =1 P

( x i |

H = c )

( H = c )

with k the cardinality of latent variable H , c a possible value for H , j an observation,

i.e. an individual, and x j the vector of values

x j

, ..., x p }

{

corresponding to the variables

in the cluster

{

X 1 , ..., X p }

Local Parameter Learning. In parallel with structure growing, the parameters of the

forest of LTMs are learned locally (see Subsection 4.2). At a given iteration, for any

variable identified as a leaf node in an LCM (corresponding to a cluster), the current

marginal distribution of this variable is replaced with its conditional distribution learnt

in the LCM. Thus, during the bottom-up construction of the FLTM, marginal distribu-

tions are successively replaced with conditional distributions.

Validation of Latent Variables. The subsumption of the candidate cluster into the

latent variable H is validated through a criterion averaging a normalized dependence

measure between H and each of H 's child nodes:

( X i ,H )

Criter =

( H )) ≥

τ latent ,

C H |

min (

( X i ) ,

∈

C H

with

C H |

the size of cluster C H .

4.3 Role of Parameters

In the forest of LTMs, the subsumption process is controlled through thresholds

τ pairwise and τ latent , and constraint δ . No latent variable is allowed to subsume va-

riables which are not highly pairwise dependent ( τ pairwise ) or which are in regions

too far from one another ( δ ); τ latent controls bottom-up information fading through the

hierarchy. τ pairwise , τ latent and δ thus monitor the number of connected components

(trees) and the number of layers in the forest. These three parameters rule the trade-off

between faithfulness to the underlying reality and tractability of the modeling.

Biomedical Engineering Systems and Technologies

Search WWH ::

Custom Search

Home