Phylogenetic Tree Reconstruction: Geometric Approaches - Mathematical Concepts and Methods in Modern Biology

Biology Reference

In-Depth Information

phylogenetic X -tree T with edge-weighting

and D will be “close”

in some specified way, and in this sense, say a tree with a “best fit” has been found. To

discuss “closeness,” it would be helpful to have a notion of distance that could be used

to compare dissimilaritymaps D coming from sequence data with induced treemetrics

D T ,ω

for which D T ,ω

. Having discussed previously how to find distances between two points on a line

2 , or in three space

3 , and more generally in

n for n

1 by means

of a standard distance (via Pythagorean theorem analog), we might be motivated to

seek a way to regard both D and D T ,ω

, in the plane

as points, and then seek a distance between

those points that we would then try to minimize in order to find an edge-weighted

tree T that “best fits” the original data. Taking that step is our next focal point.

Since any matrix M

is really just a nicely ordered list of elements (i.e.,

listed by row and cross-listed by column), there is already an immediate way to

regard matrices as points (vectors): just start writing out the entries of the matrix

across row 1; when you get to the end, start adding the elements of row 2, and so

on, until all elements are listed. For example, if M is a 4

m i , j ]

4 matrix, then there is a

correspondence of M and a vector with 16

4 entries:

m i , j ]←→ (

m 1 , 1 ,

m 1 , 2 ,

m 1 , 3 ,

m 1 , 4 ,

m 2 , 1 ,

m 2 , 2 ,

m 2 , 3 ,

m 2 , 4 ,

m 3 , 1 ,...,

m 3 , 4 ,

m 4 , 1 ,...,

m 4 , 4 ).

(10.2)

Other orderings of the elements of M to create a 16-tuple could be taken, of course,

but as long as we are consistent this has no effect on our definition of distance.

Given the symmetry and existence of all zero-diagonal elements of any dissimi-

larity map D

(

) ] x , y ∈

X , there is a lot of redundancy in the entries; in fact, if

n 2

n , then for the non-zero entries of D , there are at most m

2 (

−

)

distinct

entries d

in the n by n matrix D . Starting with D , it is straightforward to create a

streamlined vector

(

)

m carrying the same information as D , by choosing an order-

ing for reading off, and then listing in that order, the elements of D above the diagonal.

This is illustrated for one choice of such an ordering below, for D as in Eq. ( 10.1 ):

v D in

Associated vector

v D = (

)

←→ (

d T ,w (

)).

Exercise 10.16. We have just seen how to take any n by n dissimilarity map D and,

having fixed an ordering of elements in D , turn it into a point (vector)

m ,

v D ∈ R

n 2

m , explain how to reverse this

for m

2 (

−

)

. Starting from any vector

process to obtain an n by n dissimilarity map D

. Apply this to the particular case

v = (

With this adjustment, it is now possible to represent any two n by n dissimilarity

maps D

)

.Is D v atreemetric?

D by vectors

m , where m

n 2

v D ,v D ∈ R

2 (

−

)

. One can use the

D ) =

d (

standard Euclidean distance measure

δ(

1 i < j n (

(

) −

))

to measure a distance between D and D in

m . In particular, suppose D is a dis-

similarity map coming from evolutionary distances between elements in a set X (or

Mathematical Concepts and Methods in Modern Biology

Search WWH ::

Custom Search

Home