Biomedical Engineering Reference
In-Depth Information
16.1.1 Phylogenetic Data
Early evolutionary trees were built by examining the similarities and differences of
form and structure of the organisms of interest. Such an approach relies on identifying
morphological characters (i.e., presence of wings) and classifying organisms based on
the presence or absence of these features. Species are represented by binary sequences
corresponding to the morphological data. Each bit corresponds to a character. If a
species has a given feature, the corresponding bit is set to 1; otherwise, it is zero. Yet,
relying solely on morphological characters can be a major source of phylogenetic
error.
With the advent of molecular data, scientists hope to avoid the problems with
morphological criteria of relatedness. Today, most trees are built exclusively from
molecular sequences. In sequence data, characters are individual positions (or sites)
in the string, in which characters can assume one of the four states for nucleotides (A,
C, G, T) or one of the 20 states for amino acids. Sequence evolution is studied under
a simplifying assumption that each site evolves independently. Data evolve through
point mutations (i.e., changes in the state of a character), plus insertions (including
duplications) and deletions.
Figure 16.1 shows a simple evolutionary history, from the ancestral sequence at
the root (AAGACTT) to modern sequences at the leaves, with evolutionary events
occurring on each edge. This history is incomplete, as it does not detail the events that
have taken place along each edge of the tree. Thus, one might conclude that, to reach
the leftmost leaf, labeled AGGCAT, from its parent, labeled AGGGCAT, one should
infer the deletion of one nucleotide (one of the three Gs in the parent). Yet, a more
complex scenario may in fact have unfolded. If one were to compare the leftmost
leaf with the rightmost one, labeled AGCGCTT, one could account for the difference
with two changes: starting with AGGCAT, insert a C between the two Gs to obtain
AGCGCAT, then mutate the penultimate A into a T. Yet, the tree itself indicates that
the change occurred in a far more complex manner: the path between these two leaves
AAGACTT
AGGGCAT
TGGACTT
TAGCCCT
AGCACTT
AGGCAT
TAGCCCA TAGACTT
AGCACAA AGCGCTT
Figure 16.1 Evolving sequences down a fixed tree.
TGAACTT
Search WWH ::




Custom Search