Biology Reference
In-Depth Information
10.3.2 Fitting Trees: A Distance-Based Approach
With the previous comments in mind, one approach (a “distance-based approach”) to
solving the central problem of phylogenetic tree reconstruction is to
1. start with the set of (aligned) sequences s x ,
X ,
2. use an evolutionary model to derive a measure of evolutionary distance d
x
(
,
)
x
y
between x and y on the basis of the information encoded in s x and s y , and then
3. use the numerical data d
X , to try to reconstruct a phylogenetic
tree T on X which is consistent with this data.
(
x
,
y
),
x
,
y
Each of the three steps above is highly significant; for this chapter, we will be focusing
on the last step. In particular, since real sequence data contains “noise,” it is not usually
possible to find a tree which “fits” the data exactly, so in the last step above it becomes
necessary to look for a tree T on X which “best fits” the data d
X , and
we will be exploring some ways to do this in much greater detail. As preparation,
we want first to return to some mathematical formulations that will be instrumental
in what follows. We start with the definition of a dissimilarity measure (also called a
dissimilarity map).
Mathematically, for any finite set X ,a dissimilarity map D
(
x
,
y
),
x
,
y
=[
d
(
x
,
y
) ] x , y X on X
is simply a real-valued ( d
(
x
,
y
) R
for all x
,
y
X ) symmetric ( d
(
x
,
y
) =
d
(
y
,
x
)
for all x
,
y
X ) square matrix with diagonal entries zero ( d
(
x
,
x
) =
0 for all x
X ).
Exercise 10.9. Assign values to the as-yet unspecified entries d
(
i
,
j
)
of the matrix
D
=[
d
(
i
,
j
) ] i , j ∈[ 4 ]
below so as to make D a dissimilarity map on X
=[
4
]
.
d
(
1
,
1
)
13
3
13
13
0
d
(
2
,
3
)
2
D
=
d
(
3
,
1
)
π
d
(
3
,
3
)
d
(
3
,
4
)
13
2
d
(
4
,
3
)
0
Exercise 10.10. (Just for fun; the result will not be used or needed in the rest of this
chapter.) For those familiar with the linear algebraic interpretation of a real-valued
n by n square matrix A as a linear transformation T A
n
n , via the matrix
: R
R
n represented as an n by 1 column vector, what is the
effect of applying T D for D a dissimilarity map, to the standard bases vectors e 1 ,...,
e n R
product T A (v) =
A
v
,for
v R
n ?
As matrices, dissimilarity maps show up frequently in mathematics. For exam-
ple, by the polar decomposition theorem, every real-valued square matrix M can
be expressed as a sum of a symmetric and a skew-symmetric matrix via M
=
1
M T
1
M T
2 (
M
+
) +
2 (
M
)
,soif M begins as a zero-diagonal matrix, then the
1
M T
symmetric component,
2 (
M
+
)
, will be a dissimilarity map. As another exam-
ple, a map
δ : R × R R
is called a metric if
δ(
x
,
x
) =
0
,δ(
x
,
y
) = δ(
y
,
x
)
, and
δ(
. The last inequality is the well-known
triangle inequality , familiar from Euclidean geometry; think of points x
x
,
z
) δ(
x
,
y
) + δ(
y
,
z
)
for all x
,
y
,
z
R
,
y , and z as
lying on the vertices of a triangle. Re call that, for x
R , |
x
|
is the distance from x
 
Search WWH ::




Custom Search