Biomedical Engineering Reference
In-Depth Information
SAT formula can be significantly reduced based on the information achieved with
the computation of the lower bound.
Example 7.5. (Lower Bounds) Consider the set of genotypes
G Df g 1 , g 2 , g 3 , g 4 ,
g 5 g ,whereg 1 D 1010, g 2 D 0212, g 3 D 1210, g 4 D 2120, g 5 D 2222. A clique of
incompatible genotypes is f g 1 ;g 2 g ,whereg 2 contributes with 2 to the lower bound
and g 1 contributes with 1 because g 1 is homozygous. Therefore, these genotypes
are selected and the value of the clique lower bound is 3. The second lower bound
increases the value of the lower bound by 2. Genotypes g 3 and g 4 are selected be-
cause they are heterozygous at positions where every genotype already selected and
compatible with them have the same homozygous value. Consequently, at least five
different haplotypes f h 1 , h 2 , h 3 , h 4 , h 5 g are required to explain
. Genotype g 1 can
be associated with f h 1 ;h 1 g and g 2 can be associated with f h 2 ;h 3 g . Genotypes g 3
and g 4 can be associated, respectively, with h 4 and h 5 .Onlyg 5 does not contribute
to increasing the lower bound.
G
7.5.3
Upper Bounds
The computation of upper bounds is also an important issue in some solvers [ 1 , 44 ].
In general, every heuristic algorithm to the HIPP problem can be used to produce an
upper bound, for example, the delayed haplotype selection (DS) algorithm [ 34 ]and
the CollHaps heuristic approach [ 43 ]. DS was suggested for using binary search in
SHIPs and CollHaps is used by some exact models [ 1 ]. Both DS and Collhaps are
based on the Clark's method.
Clark's method [ 6 ] is a well-known algorithm to solve the haplotype inference
problem. This method starts by identifying genotypes with zero or one heterozy-
gous sites, which have only one possible explanation. Then, the method attempts to
explain the remaining genotypes with at least one of the haplotypes already iden-
tified (Clark's rule). This may eventually require the inference of new haplotypes
which will be added to the set of haplotypes. The key point to note is that there are
many ways to extend the set of haplotypes, since for genotypes with more than one
heterozygous site there are a few possible explanations.
Clearly, the Clark's method may be used to compute an upper bound to the HIPP
problem. However, this method is often too greedy. DS addresses the main draw-
back of Clark's method, avoiding the excessive greediness. Collhaps is based on
a generalization of the Clark's rule, called the collapse rule. These two heuristic
approaches provide very accurate approximation to the pure parsimony solution.
7.6
Experimental Evaluation
This section presents a summary of the experimental results, obtained in a set
of 1183 instances, including both synthetic and real data. Synthetic data include
Search WWH ::




Custom Search