Biomedical Engineering Reference
In-Depth Information
( X i ,X j )=
P
( X i )
P
( X j ) and the dependent model
P
( X i |
X j )
P
( X j ) :
I
H
( X i )+
( X j )
( X j ) =
H
H
( X i |
X j )+
H
H
( X i )
−H
( X i |
X j ) . The larger the difference
between entropies, the higher is the dependence.
Definition 5 (Allele, SNPs, Haplotype) . Due to the presence of pairs of chromosomes
in the human genome, the DNA at a given chromosome locus (SNP) may either be de-
scribed through a pair of variants (alleles or phased data) at the finer description level
or through a unique variant (unphased data). As SNPs are biallelic, only two alleles are
encountered at the corresponding loci (instead of the 4 possible nucleotides A,T,C,G).
Thus, SNPs are discrete variables whose three possible values may be coded as, say,
0 , 1 and 2 , to respectively account for aa ,
(usually not distinguishable) and
AA ,where A and a are the two alleles. An haplotype is defined as a sequence of alleles.
{
Aa, aA
}
3
Motivation and Related Work
3.1
Motivation
To tackle the difficult problem of disease association detection, several algorithms com-
ing from the machine learning domain have been proposed. Some of them use PGMs
[5,6]. Recently, forests of latent tree models have been investigated for LD modeling
purpose [3]. A forest of latent tree models (FLTM) is a forest whose trees are LTMs
(see Figure 1). FLTMs generalize LTMs, since the variables are not constrained to be
dependent upon one another, either directly or indirectly. Thus, FLTMs can describe a
larger set of configurations than LTMs.
When modeling such highly correlated variables as those in genotypic data, the chal-
lenge is all the more crucial for downstream analyses such as study and visualization of
linkage disequilibrium, mapping of disease susceptibility genetic patterns and study of
population structure. Most notably, the benefits of using FLTMs to model LD rely on
their ability to account for multiple degrees of SNP dependences and to naturally deal
with the fuzzy nature of LD block boundaries. As will further be emphasized, this latter
advantage results from the FLTM learning algorithm, which does not impose that the
SNPs subsumed by the same latent variable be neighbouring SNPs (along the genome).
3.2
Probabilistic Graphical Models to Model Linkage Disequilibrium
The FLTM-based model is meant as an improving alternative over other PGM-based
works addressing LD modeling. Besides learning of parameters ( θ ), that is apriori
and conditional probabilities for Bayesian networks, and probability distributions for
cliques and separators for Markov random fields, the most challenging task in PGM
learning is structure inference. Thomas and Camp pioneered the use of PGMs to model
LD [7]. To reach this aim, their approach relies on the general class of decomposable
Markov Random Fields (DMRF). Decomposable graphs allow the efficient computa-
tion of the likelihood of the structure, given the data. Thus, structure learning is eas-
ily performed navigating the structure space while optimizing a log-likelihood-based
score. To explore the DMRF space, operations based on connection or disconnection
 
Search WWH ::




Custom Search