Biology Reference
In-Depth Information
the many MSA construction programs (e.g. CLUSTAL W, 3
MAFFT, 4 MUSCLE 5 ) that use, as a means to an end, a guide tree
obtained by some crude method like UPGMA 6 (Unweighted
pair group method using arithmetic averages). The relatedness
between the two problems has been a motivation to come up
with phylogeny methods that operate on unaligned sequences
and solve the extended problem of combined MSA and tree
inference (see, for example, Refs. 7-9). Sequence-based methods
are based on similarities in the amino acid or base sequence
of genes.
Evolution also operates on whole genomes by the inversion,
transposition, and duplication of (groups of) genes. Gene con-
tent methods use the presence/absence profiles of orthologous
genes to construct trees. Together with the sequence-based
methods, they can be classified as character-based methods. In
recent years, interest in genome rearrangement tree building
algorithms that use the order of genes on a chromosome as an
input has flourished. Popular implementations include MGR 10
and GRAPPA. 11
Distance methods rely on a measure of the pairwise evolution-
ary distances between the objects (sequences, genomes, species)
being classified. The distances should reflect the leaf-to-leaf path
lengths of an underlying tree. Typically, they are estimated from
character or gene order data in a statistical framework under some
model of evolution, but they can also be obtained from other
processes such as DNA-DNA hybridization.
Sequence and distance methods are the most established
ones and can be used to build gene as well as species trees,
whereas gene content and gene order methods lead to species
trees only.
Model assumptions on the data . All tree building methods make,
explicitly or implicitly, assumptions about the data. For character-
based methods, the assumptions often refer to the evolutionary
processes under which the data arose. For example, sequence-
based methods commonly use a first-order Markovian model of
Search WWH ::




Custom Search