Biomedical Engineering Reference
In-Depth Information
value 1 represents the mutant. A haplotype is then a string over the alphabet f 0,1 g .
Moreover, genotypes may be represented by extending the alphabet used for repre-
senting haplotypes to f 0,1,2 g . Homozygous sites are then represented by values 0 or
1, depending on whether both haplotypes have value 0 or 1 at that site, respectively.
Heterozygous sites are represented by value 2. A genotype g i is explained by a pair
.h i ;h i / of haplotypes. This fact is represented by g i D h i ˚ h i
with
8
<
0 if h ij
D h ij
D 0
1 if h ij
D h ij
g ij
D
D 1
;
:
2 if h ij
¤ h ij
for each specific site g ij , with 1 j m.
Definition 7.1. (Haplotype Inference) Given a set with n genotypes,
G
, each with
size m, the haplotype inference problem aims at finding the set of haplotypes,
H
,
G
and associating a pair of haplotypes (h i
, h i
which originate the genotypes in
),
with h i
, h i 2 H
, such that g i D h i ˚ h i
.
Example 7.1. (Haplotype Inference) Consider genotype 02212 having five sites, of
which one SNP is homozygous with value 0, one SNP is homozygous with value
1 and the remaining three SNPs correspond to heterozygous sites. There are four
different possible explanations for this genotype: (00010, 01111), (00110, 01011),
(00111, 01010)and(00011, 01110).
For each genotype g i 2 G
, to each genotype g i 2 G
with z heterozygous positions, there are 2 z 1 possible
pairs of haplotypes which can explain g i . Choosing the biological correct haplotype
pair would be impossible without the implicit or explicit use of some genetic model
to guide the algorithm in constructing a solution. The coalescent model [ 22 ] states
that there is a unique ancestor for all individuals of the same population. In this
chapter, we consider the pure parsimony approach which is indirectly related to the
coalescent genetic model.
7.2.1
Haplotype Inference by Pure Parsimony
The most explored combinatorial approach to the haplotype inference problem is
called HIPP [ 19 ]. A solution to this problem minimizes the total number of dis-
tinct haplotypes being used. The idea of searching for the solution with the smallest
number of haplotypes is biologically motivated by the fact that individuals from the
same population have the same ancestors and mutations do not occur often. More-
over, empirical results provide support for this method: the number of haplotypes
in a large population is typically very small, although genotypes exhibit a great
diversity.
Definition 7.2. The HIPP problem consists in finding a minimum-size set
H
of
haplotypes that explain all genotypes in
G
.
Search WWH ::




Custom Search