Biomedical Engineering Reference
In-Depth Information
Example 7.2. (HIPP) Consider the set of genotypes
G Df g 1 ;g 2 ;g 3 gD
f 022, 221, 222 g . There are solutions using six different haplotypes:
H 1 D
f 000; 001; 010; 011; 101; 111 g , such that g 1 D 001 ˚ 010, g 2 D 011 ˚ 101
and g 3 D 000 ˚ 111. However, the HIPP solution only requires four distinct haplo-
types:
H 2 Df 000, 011, 101, 111 g such that g 1 D 011 ˚ 000, g 2 D 011 ˚ 101 and
g 3 D 011 ˚ 100.
In general, there may exist more than one solution to the problem.
Finding a solution to the HIPP problem is an APX-hard (and consequently, NP-
hard) problem [ 28 ]. This means that not only there is no polynomial time algorithm
to solve the problem, but also does not exist a polynomial time approximation
scheme, unless P D NP.
7.3
Related Work
A significant number of methods have been developed to solve the pure parsimony
haplotyping problem, pursuing the efficiency goal.
Different constraint techniques have been applied to solving the HIPP problem.
The most used technique is integer linear programming (ILP) [ 1 - 3 , 5 , 19 , 20 , 28 ].
Other techniques are based on branch-and-bound [ 27 , 44 ], Boolean satisfiability
(SAT) [ 29 , 36 ], pseudo-Boolean optimization (PBO) [ 16 ], answer set programming
(ASP) [ 12 ] and constraint programming (CP) [ 37 ].
The first HIPP model was proposed by Gusfield [ 19 ] and is an integer linear pro-
graming approach. The main drawback of Gusfield's model, RTIP, is its size because
the model has an exponential number of variables and constraints. RTIP considers
enumerating all pairs of haplotypes which explain every genotype. Given that, for
each genotype g
, the number of haplotype pairs which explain g is 2 z ,where
z is the number of heterozygous positions of g, RTIP turns out to be an exponen-
tial model. Nonetheless, the model is simplified by not considering haplotype pairs
where both haplotypes cannot explain more than one genotype. Note that these hap-
lotype pairs can be removed from the formulation while maintaining the correctness
of the algorithm. This simplification makes the model practical in several situations,
specially when the level of recombination is high, because there exists more haplo-
type diversity. Even though, the model still has its exponentially as a drawback.
The RTIP model inspired a branch-and-bound algorithm to the HIPP problem,
known as HAPAR [ 44 ]. A more recent branch-and-bound algorithm is used to solve
an ILP model based on a set-covering formulation of the problem [ 27 ].
PolyIP [ 2 ] is a model proposed by Brown and Harrower, which is polynomial
on the number of genotypes and sites. Similar formulations were independently
suggested by other authors [ 20 , 28 ]. PolyIP associates two haplotypes with each
genotype. A Boolean variable is associated with each haplotype site and constraints
ensure that each haplotype pair explains the corresponding genotype. Moreover, for
each two haplotypes, the model needs variables defined as true if the haplotypes
are different. Furthermore, for each haplotype there exists a variable which is true
2 G
Search WWH ::




Custom Search