Biomedical Engineering Reference
In-Depth Information
sequences. MP attempts to minimize this score following the philosophy of Occam's
razor — the simplest explanation of the data is preferred. Under MP, a total cost is
assigned to each tree, and the optimal tree (i.e., the most parsimonious tree) is defined
as the one with the smallest total cost. In this approach, a unit cost is given for each
nucleotide substitution. A central step in the procedure is allocating sequences to the
internal nodes in the tree. For any set of sequence allocations, the total cost of the tree
is the sum of the costs of the various edges, where the cost of joining two internal
nodes, or an internal node and a leaf, is the number of substitutions needed to move
from the sequence at one to the sequence at the other (i.e., the Hamming distance).
The most popular software packages that implement MP heuristics are PAUP* [3],
Phylip [4], and TNT [6].
Likelihood Methods Likelihood methods require the specification of an evolu-
tionary model. For a given phylogenetic arrangement, the question is: what is the
likelihood that evolution under the specified parameters will produce the observed
nucleotide sequences? For a given evolutionary hypothesis, the likelihood of the
observed change is computed for each position, and then the product of the likelihoods
is expressed as a distance or branch length between the sequences. The parameters are
then varied, and the combination with the highest likelihood is accepted. This proce-
dure is then repeated for another arrangement, the two topologies compared, and the
one with the highest likelihood selected. The selective process is continued until an
arrangement is found with the combined ML of both an evolutionary hypothesis and
a topology. Finding the best tree under ML is the most computationally demanding
of the methods discussed here.
Like MP, ML is an optimization problem. ML seeks the tree and associated
model parameter values that maximize the probability of producing the given set of
sequences. ML thus depends explicitly on an assumed model of evolution. For exam-
ple, the ML problem under the Jukes-Cantor model needs to estimate one parameter
(the substitution probability) for each edge of the tree, while under theGeneralMarkov
model 12 parameters must be estimated on each edge. Scoring a fixed tree cannot
be done in polynomial time for ML [15], whereas it is easily accomplished in lin-
ear time for MP [16]. Various software packages provide heuristics for ML, include
PAUP* [3], Phylip [4], fastDNAml [17], and PHYML [18].
Bayesian methods deserve a special mention among likelihood-based approaches;
they compute the posterior probability that the observed data would have been
produced by various trees (in contrast to a true ML method, which computes the
probability that a fixed tree would produce various kinds of data at its leaves). Their
implementation with Markov chain Monte-Carlo (MCMC) algorithms often run sig-
nificantly faster than pure ML methods. Moreover, the moves through state space can
be designed to enhance convergence rates and speedup execution. Mr-Bayes [5] is the
most popular software package for reconstructing trees based on Bayesian analysis.
16.1.3 Large-Scale Phylogenies
The remainder of this paper considers high-performance approaches to inferring trees
under MP. Our reasons for focusing on MP are twofold. First, MP remains the major
Search WWH ::




Custom Search