Biology Reference
In-Depth Information
Similarly to this, the evolutionary lineages—and thus the types—of
insertion/deletion events creating length differences between two
sequences can be inferred using phylogenetic information from
related sequences.
Over time, sequences accumulate changes. Despite some differ-
ences in genome sizes, we can assume that sequences tend to retain
their approximate length and insertion of new characters is counter-
balanced by deletion of others. After a split from a common ancestor,
the number of substitution-differences at homologous positions in
descendant sequences increases until the sequence identity drops to
the level expected by random sequences. The effect of insertions and
deletions is very different. Assuming that each new insertion is not
immediately followed by the deletion of the newly inserted charac-
ters, the total number of independent homologous sites within a set
of sequence keeps increasing. With more than few sequences in the
set, the increase in the number of independent homologous sites—
and thus the number of columns in the alignment representing
them—is not significantly affected by deletions as the chances of
the same sites being independently deleted in all evolutionary
lineages are small. Thus, the total length of the sequence alignment
correctly representing the evolutionary homology among the char-
acters is expected to grow roughly linearly with the evolutionary time
covered by the different sequence lineages. Over long periods of
time, the ancestral characters of a neutrally evolving sequence (or
sequence region) are expected to be completely replaced by new
characters through combinations of insertions and deletions: as a
result, the correct evolutionary alignment of highly-diverged descen-
dant sequences should not match a single character. Typically, the
more freely-evolving sequence regions are flanked by conserved
regions (e.g., loops and coils vs. core region in protein sequences)
and the alignment is both possible and meaningful.
In practice, the alignment length rarely grows linearly with the
evolutionary divergence. If the alignment is performed with meth-
ods based on the classical progressive algorithm [ 3 , 4 ], the align-
ment length may grow linearly with the number of substitution
changes for a while, but the growth curves of the two then separate
and the alignment length increases only slowly, if at all (Fig. 2 ). The
reason for this is that the classical algorithm does not distinguish
insertions from deletions and, inherently, considers all length dif-
ferences as deletion events. The use of such biased alignments in
evolutionary analysis is likely to lead to erroneous conclusions.
3
Phylogeny-Aware Alignment
Independent insertions at the same position are not homologous
and have to be identified to allow for their correct placement in
different alignment columns. This alone demonstrates that an evo-
lutionarily accurate alignment cannot be generated without
Search WWH ::




Custom Search