Biomedical Engineering Reference
In-Depth Information
5
Evaluation
The application software, CFHLC+, is available at http://sites.google.com/site/
raphaelmourad/Home/programmes. It is developed in C++ and relies on the ProBT li-
brary dedicated to Bayesian networks (http://bayesian-programming.org).
The algorithm was tested on datasets describing 10 5 SNPs for 2000 individuals. With
the first version, the running time was around 15 hours for an arbitrary window size of
100 SNPs. When setting the sliding window size δ to 0 . 5 Mb , a reasonable choice to
capture LD, the novel algorithm now runs in less than 12 hours. It has to be emphasized
that as the algorithm runs EM with 10 restarts, a significant improvement has been
brought with respect to the initial version. Finally, the algorithm is shown quasi linear
with the number of SNPs and linear with the sliding window size. Such experimenta-
tions are reported in [3], together with the examination of the robustness with respect
to parameter adjustment.
FLTM was shown to faithfully model linkage disequilibrium. Due to its hierarchical
structure, the multiple layers of an FLTM are expected to describe various degrees of
LD strength. To check this property, the principle was the following: for some given
genomic region, two matrices were compared. The standard triangular matrix M c of
pairwise dependences ( r 2 coefficient) between SNPs was first calculated. Then, for
each pair of SNPs, the latent variable representing the lowest common ancestor (LCA)
was identified. On the other hand, it is easy to compute the mean r 2 over all latent va-
riables located in the same level in the FLTM hierarchy. Thus, each cell of the second
matrix, M d , was assigned the mean r 2 measure associated with the LCA level. For a
visual comparison, a color palette where shade darkens whith increasing dependence
was assigned to M c , whereas a discretized palette was affected to M d . The visual com-
parison of the two plots brilliantly showed that the FLTM faithfully reflects LD strength
variety (see [3]).
In complement, it was also shown that FLTM provides a compact and interpretable
view of LD for the geneticist. Low-level latent variables represent short-range LD and
are interpreted as haplotype shared ancestry. High-level latent variables correspond to
long-range LD, induced by population admixture or natural selection. The flexibility
of FLTM was highlighted in [23], where short-, long- and chromosome-wide linkage
disequilibrium was modeled and visualized.
Equally important for the genetic association purpose is the dimension reduction
aspect, with its consequence, possible bad subsumption. Drastic reductions are observed
as a rule (about 85% ) (see [3]). However, the quality of the information about the child
variables is expected to decrease in a bottom-up fashion, for latent variables. Now the
soundness of FLTM for LD modeling is assessed, a demonstration of the ability to
capture genetic associations is still requested.
6
Protocol to Assess the Suitability of FLTM to Association
Genetics
The objective of the study is to investigate how information about causality fades from
bottom to top in the hierarchy and what are the trends regarding the ratios of latent
 
Search WWH ::




Custom Search