Biology Reference
In-Depth Information
NJ reconstructs the correct tree if the input distances are additive
and exact. In cases where the distances are estimated, it can be
viewed as a greedy approximation to the ME method. BIONJ 20
and Weighbor 21 are weighted versions of the NJ algorithm; both
methods take into account the estimation error of the input
distances.
Parsimony . Parsimony is a nonparametric, optimality-based char-
acter method. It selects the tree that needs the minimum number
of character changes to explain the evolution of the leaves. By
being nonparametric, the method does not rely on an explicit
model of character evolution. Felsenstein 22 demonstrated that par-
simony can lead to LBA. He used a simple model of evolution and
a quartet with similar characteristics as the one given in the sub-
section on UPGMA (two long branches separated by a short one)
to show that the trees inferred by parsimony tend to group the
two long branches together.
Maximum likelihood ( ML ). ML estimation belongs to the fre-
quentist school of statistics and is a method to fit the parameters
( P ) of a mathematical model ( M ) to given data ( D ). The probabil-
ity of the data given the model Pr ( D|M , P ) is central to the ML
method. When the model (including its parameters) is kept con-
stant, the probability over all possible data sets sums to one. The
ML method takes a different view. It considers Pr ( D | M , P ) a func-
tion of the parameters. The data and the form of the model are
kept constant. In this case, Pr ( D | M , P ) is called the likelihood
function for P , often written as L ( P )
=
Pr ( D | M , P ). The ML
method simply chooses the parameters P ·
which maximize L ( P ),
i.e. P · =
argmax P L ( P ). In other words, the parameters of the model
are chosen that make the input data most likely.
For phylogenetic tree building, the parameters to be estimated
are at least the tree topology and the corresponding branch
lengths, but they can also include other model parameters like, for
example, shape parameters of site rate distributions. ML can, in
principle, be applied to any type of data as long as a model can be
specified such that it could have been generated. We have men-
tioned above that, under certain assumptions, LS tree construction
leads to ML distance trees. Nevertheless, when we speak of ML
Search WWH ::




Custom Search