Haplotype Inference Using Propositional Satisfiability - Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Biomedical Engineering Reference

In-Depth Information

: h kj _: s ki ^ : h kj _: s ki :

(7.1)

If a site g ij is 1, then the model requires, for k D 1;:::;r,

h kj _: s ki ^ h kj _: s ki :

(7.2)

Otherwise, one requires that the haplotypes explaining genotype g i

have opposite

values at site i . This is done by creating two variables, g ij

2f 0; 1 g and g ij

2f 0; 1 g ,

such that g ij

¤ g ij . In CNF, the model requires two clauses,

g ij

_ g ij

_: g ij :

: g ij

(7.3)

In addition, the model requires, for k D 1;:::;r,

h kj _: g ij

_: s ki ^ : h kj _ g ij

_: s ki

^ h kj _: g ij

_: s ki ^ : h kj _ g ij

_: s ki :

(7.4)

Clearly, for each value of i ,andfora and b, it is necessary that exactly one haplo-

type is used, and so exactly one selector variable be assigned value 1. This can be

captured with cardinality constraints,

r X

s ki D 1

(7.5)

k D 1

The SHIPs model can also be described using a matrix formulation G D S a

H ˚ S b H ,whereG is a n m matrix describing the genotypes, H is a r m

matrix of haplotype variables, S a and S b are the n r matrices of selector variables

and ˚ is the explanation operation.

Theorem 7.1. (Space Complexity) If a solution is found with r f

haplotypes, then

the number of constraints of the SAT model is

O .r f nm / ,whichis

O .n 2 m/ .In

O .n 2 C nm / .

addition, the number of variables is

O . nm C r f m C r f n/ ,whichis

The core SHIPs model is not effective in practice. As a result, several key

improvements have been developed, which are essential for obtaining significant

performance gains over existing approaches.

A crucial technique is the utilization of a tight lower bound estimate, which

reduces the number of iterations of the algorithm, but also effectively prunes the

search space. The computation of lower bounds is discussed in Sect. 7.5.2 .

One additional key technique for pruning the search space is motivated by

observing the existence of symmetry in the problem formulation. Consider two

haplotypes h k 1 and h k 2 , and the selector variables s k 1 i

, s k 2 i

, s k 1 i

and s k 2 i

.Further-

more, consider Boolean valuations v x and v y to the sites of haplotypes h k 1 and h k 2 .

Then, h v x

k 1

and h v y

k 2

1001, corresponds to h v y

k 1

and h v x

k 2

, with s k 1 i s k 2 i s k 1 i s k 2 i

Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Search WWH ::

Custom Search

Home