Phylogenetic Tree Reconstruction: Geometric Approaches - Mathematical Concepts and Methods in Modern Biology

Biology Reference

In-Depth Information

10.3.2 Fitting Trees: A Distance-Based Approach

With the previous comments in mind, one approach (a “distance-based approach”) to

solving the central problem of phylogenetic tree reconstruction is to

1. start with the set of (aligned) sequences s x ,

∈

X ,

2. use an evolutionary model to derive a measure of evolutionary distance d

(

)

between x and y on the basis of the information encoded in s x and s y , and then

3. use the numerical data d

X , to try to reconstruct a phylogenetic

tree T on X which is consistent with this data.

(

∈

Each of the three steps above is highly significant; for this chapter, we will be focusing

on the last step. In particular, since real sequence data contains “noise,” it is not usually

possible to find a tree which “fits” the data exactly, so in the last step above it becomes

necessary to look for a tree T on X which “best fits” the data d

X , and

we will be exploring some ways to do this in much greater detail. As preparation,

we want first to return to some mathematical formulations that will be instrumental

in what follows. We start with the definition of a dissimilarity measure (also called a

dissimilarity map).

Mathematically, for any finite set X ,a dissimilarity map D

(

∈

(

) ] x , y ∈ X on X

is simply a real-valued ( d

(

) ∈ R

for all x

∈

X ) symmetric ( d

(

) =

(

)

for all x

∈

X ) square matrix with diagonal entries zero ( d

(

) =

0 for all x

∈

X ).

Exercise 10.9. Assign values to the as-yet unspecified entries d

(

)

of the matrix

(

) ] i , j ∈[ 4 ]

below so as to make D a dissimilarity map on X

]

⎡

⎣

⎤

⎦

(

)

−

(

)

(

)

(

)

(

)

−

(

)

Exercise 10.10. (Just for fun; the result will not be used or needed in the rest of this

chapter.) For those familiar with the linear algebraic interpretation of a real-valued

n by n square matrix A as a linear transformation T A

n , via the matrix

: R

→ R

n represented as an n by 1 column vector, what is the

effect of applying T D for D a dissimilarity map, to the standard bases vectors e 1 ,...,

e n ∈ R

product T A (v) =

,for

v ∈ R

n ?

As matrices, dissimilarity maps show up frequently in mathematics. For exam-

ple, by the polar decomposition theorem, every real-valued square matrix M can

be expressed as a sum of a symmetric and a skew-symmetric matrix via M

M T

2 (

) +

2 (

−

)

,soif M begins as a zero-diagonal matrix, then the

M T

symmetric component,

2 (

)

, will be a dissimilarity map. As another exam-

ple, a map

δ : R × R → R

is called a metric if

δ(

) =

,δ(

) = δ(

)

, and

δ(

. The last inequality is the well-known

triangle inequality , familiar from Euclidean geometry; think of points x

) δ(

) + δ(

)

for all x

∈ R

y , and z as

lying on the vertices of a triangle. Re call that, for x

∈ R , |

is the distance from x

Mathematical Concepts and Methods in Modern Biology

Search WWH ::

Custom Search

Home