Biology Reference
In-Depth Information
The insight from Theorem 6.8 is that the solutions of a restricted form of the
Pure Parsimony problem are representable as a collection of paths. However, this
technique has two shortcomings. First, the longest path problem is NP-Complete,
making each step of the algorithm difficult. So, the technique describes the nature
of a solution but does not provide an efficient solution procedure. The second
shortcoming is that the algorithm is not capable of finding every optimal solution.
To see this, consider the following collection of genotypes,
!
"
1 = γ ( w 1 )=(2 , 0 ,
g
2 ,
2 ,
2 ,
2)
2 = γ ( w 2 )=(0 , 2 , 0 , 0 ,
g
2 ,
2)
3 = γ ( w 3 )=(
g
2 , 0 , 2 , 0 ,
2 , 0)
(6.7)
g
4 = γ ( w 4 )=(
2 , 0 , 0 , 2 , 0 ,
2)
#
5 = γ ( w 5 )=(
g
2 ,
2 ,
2 , 0 , 2 ,
2)
6 = γ ( w 6 )=(
g
2 ,
2 , 0 ,
2 ,
2 , 2).
=2 6 ,that η (
6 ,andthat
Assume that
|V|
V
)=
{−
1 , 1
}
E
is the largest edge set
i , h
i , g
i +1 if and only if
possible. Notice that a path may contain the sequence g
i has a value of 2 or
i +1
there is no SNP where g
2 and g
has the other value.
1 , h , g
3 exists in the
So, in the above example there is no h such that the path g
1
3
diversity graph because the first SNP of g
is a 2 and the first SNP of g
is a
1 , h , g
2 , is in the diversity graph
2. However, there is an h so that the path g
1 and g
2 have different values of 2 and
because there is no SNP where g
2.Ifwe
compare each pair of genotypes in a similar fashion, we find that the paths pass
through the genotypes as indicated in Fig. 6.3. From this figure we see that there
is not a path or cycle through every genotype, but that there are several two path
solutions. From Theorem 6.8 we know that φ (2) = 6+2 = 8.Uptoreversingthe
order of the genotypes, there are four optimal progressions through the genotypes,
see Table 6.1. Our algorithm finds the first solution indicated in Table 6.1, as the
first path is as long as possible. None of the other paths have this property, and so
the algorithm is not capable of finding these solutions.
Table 6.1. Ways in which the genotypes for the example
in (6.7) can be listed in two distinct paths.
First Path's
Second Path's
Genotype Progression
Genotype Progression
( g 1 , g 2 , g 3 , g 4 , g 5 )
( g 6 )
( g 1 , g 2 , g 4 , g 5 )
( g 3 , g 6 )
( g 1 , g 2 , g 3 , g 6 )
( g 4 , g 5 )
( g 6 , g 3 , g 4 , g 5 )
( g 1 , g 2 )
Our last discussion approaches the pure parsimony problem through lattice
theory and requires the more general
asdiscussedinSection6.2. Let
be a
binary relation such that 2
2, 2
0,
2
2,
2
0,and0
0.Then
Search WWH ::




Custom Search