Phylogenetic Tree Building Methods - Bioinformatics: A Swiss Perspective

Biology Reference

In-Depth Information

additive or ultrametric, and exact. These restricted requirements

are, however, never met in practice, so the trees returned are sub-

optimal. UPGMA and neighbor joining (NJ) are examples of these

methods.

The second class is in principle better suited to deal with real

data. The goal for these methods is to find the best tree for an

explicit optimality criterion, thereby separating the problem of

evaluating and searching trees. Ideally, we would want to score all

tree topologies to find the optimal one; however, as shown in Sec. 1,

the number of tree topologies grows rapidly with the number of

leaves so that a complete enumeration becomes impractical already

for, say, 15 leaves. d In all cases, searching the tree space makes the

problem difficult. Given a topology, finding the branch lengths is

generally easy. Section 3 is devoted to the description of a heuris-

tic approach that can be taken to tackle this problem.

In our opinion, the first class is poorly defined. By lacking an

optimization criterion, one can never be sure whether a tree is

poorly constructed (algorithm is not good enough) or the data is

not good enough. Hence, we recommend algorithms that have a

precise optimization goal.

•

Statistical method . The school of statistics used is a further way to

classify tree building methods. There are two common approaches

for the statistical analysis of empirical data and parameter estima-

tion: the frequentist and the Bayesian approaches. They are

divided on the fundamental definition of probability. In the fre-

quentist's definition, probability is seen as the long-run expected

frequency of the occurrence of events; whereas the Bayesian defi-

nition views probability as a measure of a state of knowledge.

Implicit in the frequentist's view is that a parameter

is a fixed

quantity in nature that we wish to measure. In the Bayesian

approach, the existence of a true value of

φ

is not necessarily

assumed. To Bayesians, parameters are random variables, not con-

stants. For example, in the frequentist's view of a coin-tossing

φ

d All of the nontrivial tree building can be viewed formally as an NP-complete

optimization problem (see Refs. 12-14).

Search WWH ::

Custom Search

Home