Phylogenetic Tree Reconstruction: Geometric Approaches - Mathematical Concepts and Methods in Modern Biology

Biology Reference

In-Depth Information

If T is a phylogenetic X -tree, for any set X with

4, and with any positive

edge-weighting

and induced dissimilarity map D T ,ω

, show that the four-point

condition holds for T .

= (

⊂

Returning to an arbitrary labeled X -tree T

V with edge-weighting

ω :

→ R

, we have, by assumption, that there's no edge between a leaf and itself,

so d T ,ω (

X . This is consistent with notions of evolutionary

distance: there's zero dissimilarity between any sequence s x and itself. It is also

presumed that the information encoded by evolutionary distance measures on pairs

of sequences s x ,

) =

0 for any x

∈

s y is independent of the ordering of the two (i.e., it does not matter

which is “first” or “second”). From Exercise 10.8 , one can see explicitly that the

evolutionary distances dJ and dH are dissimilarity maps.

We have seen that if one begins with an edge-weighted X -tree T ,usingthetree

and the weighting, one can use the edge labels to find a value d T ,ω (

)

for each

pair of leaves x

X , that has an interpretation as a “distance,” namely, the path

length between x and y along the tree T , if the “length” of an edge e is taken to be its

edge weight

∈

) ] x , y ∈ X is a

dissimilarity map. Again, the fundamental biological problem of phylogenetics is to

start the other way around—from a relevant set X (species, genes, etc. or sequences

standing in for the species, genes, and so on) and a collection

ω(

)

. The resulting matrix of all values D T ,ω =[

d T ,ω (

{

(

) x , y ∈ X }

relevant “distances,” find a tree T and an edge-weighting

for which the natural

dissimilarity map D T ,ω fits the data D

) ] x , y ∈ X “well.” This is what is meant

by “reconstructing a phylogenetic tree” from the given data, using a distance-based

approach. In themost ideal case of “fittingwell,” D T ,ω fits D exactly, that is, they agree

as functions: d T ,ω (

(

) =

(

)

for all x

∈

X . If one begins with a phylogenetic

X -tree T , and a nonnegative edge-weighting

for T , and takes as data D by setting

D T ,ω , then trivially D T ,ω fits D exactly. This raises the issue of consistency in

tree reconstruction methods, that is, whether a tree reconstruction method applied to

data D that is derived from a tree T (perhaps weighted, with weight

) actually outputs

this tree T (and the corresponding weights

). One can also speak more generally of

statistical consistency of a tree reconstruction method, namely, the probability that

the method outputs the correct tree given sufficient data about the input.

Exercise 10.14.

1. For the quartet tree T with cherry

{

,w }

having parent node u and cherry

{

}

having parent node v , and for the representative sequences on the leaves X

{

,w,

}

below, if D

dH , can you find an edge-weighting

for T so that

D T ,ω =

D ?

s w =

GATTTCCTTC

s x =

GACATACTTC

s Y

GATTACATTC

s z =

GATTAAACTTC

In the basic parsimony method of tree reconstruction, cherries are selected

by linking nodes for which d H is minimal. For the same quartet tree T with

Mathematical Concepts and Methods in Modern Biology

Search WWH ::

Custom Search

Home