Biomedical Engineering Reference
In-Depth Information
complemented sites can also be discarded. Two sites are complemented if whenever
one site has value 1, the other site has value 0 and vice versa.
By keeping information of the genotypes and sites discarded, it is straightforward
to construct a solution for the original set of genotypes once discovered a solution
to the simplified set of genotypes.
Example 7.4. (Structural Simplifications) Consider the following set of four geno-
types each with five sites:
G Df 02122; 12021; 02122; 20201 g . Note that the first
genotype is equal to the third genotype. Hence, the third genotype can be discarded.
In addition, note that the second and the fourth sites are equal. Also the first and the
third sites are complemented. Hence, the third and the fourth sites can be removed.
Consequently, the simplified set of genotypes is a set with three genotypes and three
sites: simplified . G / Df 022; 121; 201 g .
7.5.2
Lower Bounds
The computation of tight bounds is a key issue in several solvers, e.g., SHIPs
and HAPAR. This section describes two algorithms used to compute lower bounds
which have been proposed with the SHIPs algorithm [ 29 , 31 ].
The first algorithm relies on the incompatibility between genotypes [ 29 ]. Recall
that two genotypes are incompatible if there exists a site for which the value of one
genotype is 0 and the other genotype is 1. Otherwise, the genotypes are compatible .
Clearly, incompatible genotypes cannot be explained by common haplotypes. The
first lower bound procedure consists in finding a clique of incompatible genotypes
in a graph where each vertex is a genotype, and there is an edge between two geno-
types if they are incompatible. If the clique has k genotypes, then the number of
haplotypes required is guaranteed to be equal or greater than 2k ,where is the
number of homozygous genotypes in the clique.
The clique lower bound previously described can be improved taking into ac-
count the structure of the genotypes [ 31 ]. Let G S denote the set of selected geno-
types. At the beginning, G S corresponds to the set of genotypes in the clique of
incompatible genotypes. The goal is to find heterozygous sites g ij in genotypes
which do not belong to G S , such that all genotypes g 2 G S compatible with g i
have the same homozygous value in that position. Define the propositional function
.i; k/ to be true if g i and g k are compatible. (Clearly, is a symmetric relation.)
Thus, the goal is to find a genotype g i
G S such that there exists a position j
where g ij
D
2 and a value val
2f 0; 1 g
such that, for all g k 2
G S ,if.i; k/
then g kj D val, i.e.,
9 1 j m .g ij D 2
^9 val 2f 0;1 g 8 g k 2 G S ..i; k/ ) g kj D val//:
(7.19)
Then G S D G S [f g i g and the lower bound can be increased by one. The process is
iterated until there are no more genotypes g i G S which satisfy ( 7.19 ).
The interest in integrating a lower bound in SHIPs is twofold. First, the number
of iterations of the algorithm is reduced. Second, and more important, the size of the
Search WWH ::




Custom Search