Biomedical Engineering Reference
In-Depth Information
If g kj ยค 2 ^ g ij
D 2,thenR D .g kj , .p , a// and S
D t ij .
If g ij D 2 ^ g kj D 2,thenR D .p , q/ and S D .t ij , t kj /.
Moreover, in order to count the number of distinct haplotypes used, Boolean
variables u i
are defined such that u i
is assigned value 1 if haplotype h i
explaining
genotype g i is different from every haplotype which explain genotype g k with k<i.
The conditions on variables u i
are based on the conditions for the x i
variables,
x pa
^
ik ^ x pb
) u i :
(7.16)
ik
1 k<i
Finally, the cost function is given by
u i C u i
:
n X
min
(7.17)
i D 1
Theorem 7.2. (Space Complexity) Let n be the number of genotypes in
, m the
number of positions of each genotype and the number of heterozygous positions
of the instance. Then, the number of variables of the PBO model is
G
O .n 2 C / ,
O .n 2 C nm/ . In addition, the number of constraints of the
which corresponds to
O .n 2 m/ .
PBO model is
The RPoly model has been further improved, following the ideas used by SHIPs.
In particular, the utilization of lower bounds as a method for pruning the search
space shows to be very effective. The integration of a tight lower bound (Sect. 7.5.2 )
allows fixing the values of some variables u , and the clauses used for constraining
the value of u need not be generated. This allows the PBO solver to focus on the
remaining u variables, and the size of the generated PBO problem instances becomes
significantly smaller. For the more complex problem instances, the integration of
lower bound information reduces the size of the generated PBO instances by a factor
between 2 and 3, on average.
Finally, an additional improvement is the inclusion of cardinality constraints on
the x variables. Adding new constraints to a model is expected to prune the search
and therefore contributes to the efficiency of the solver.
7.4.4.1
Missing Data
Most often real genotype data contain a significant percentage of unknown data.
Even with modern automated DNA analysis techniques, generating data with miss-
ing alleles is not an uncommon situation [ 25 ].
One useful feature of the RPoly tool is to be able to deal with unspecified
genotype sites. Genotyping procedures often leave a percentage of missing geno-
type positions, and so haplotype inference tools need to be able to deal with missing
sites. RPoly can handle SNPs with unspecified values, inferring the values for the
missing sites and still guaranteeing a parsimonious solution.
Search WWH ::




Custom Search