Biology Reference
In-Depth Information
Assuming that the alignment guide tree is correct and that the
sequences are relatively closely-related, the progressive alignment
approach provides the information necessary to identify insertion
and deletion events. The phylogeny-aware progressive algorithm
implemented in PRANK [ 5 , 6 ] uses outgroup information from the
next alignment step to decide if the length difference observed
between the aligned sequences (representing either true extant
sequences or internal nodes representing an aligned subset) was
caused by an insertion or a deletion (Fig. 3 ). By identifying the true
evolutionary event, the phylogeny-aware algorithm can handle
insertions correctly and avoid penalizing the single event multiple
times in later stages of the alignment.
The phylogeny-aware algorithm flags sites that contain an
alignment gap in the immediately preceding stage of the progres-
sive alignment, allowing for free placement of new gaps at flagged
positions in the very next round. For an insertion, a new gap is
created at exactly the same position and the flags indicating the gap
are retained; for a gap caused by a deletion, a better alignment is
obtained by matching the sites and the flags are removed (Fig. 3 ).
The algorithm keeps the inserted sites at the later stages of the
progressive alignment and the sequences it reconstructs for the
internal nodes of the alignment tree may not reflect the true length
of the ancestral sequences. Despite that, the identification and
marking of the insertion events avoids penalizing for the same
events multiple times and provides a significant improvement over
the classical algorithm that, in practice, considers all length differ-
ences as deletions.
Penalization of a single event multiple times seems an insignifi-
cant error if the procedure nevertheless reconstructs the correct
alignment. In trivial alignment tasks that may be the case but in
more complex ones the classical algorithm will allow for the match-
ing of insertions with non-homologous characters, the resulting
alignments indicating false homologies (Fig. 4 ). The heuristics
proposed to correct for insertion events by lowering the gap cost
at sites already containing gaps (e.g., [ 7 , 8 ]) cannot prevent this; in
contrast, they typically cause further errors by moving gaps caused
by deletion events at near-by sites to the same columns and produce
block-like alignments with alternating gappy and conserved regions
( see Fig. 1 ). The basic version of the phylogeny-aware algorithm
greatly reduces the problem but even that cannot completely avoid
the matching of independent insertions, especially in the alignment
of large datasets in which the chances of mutation events at near-by
positions is significant (Fig. 2 ).
As discussed above, the phylogeny-aware algorithm identifies
the type of insertion-deletion event and then handles the event
accordingly, either creating a new gap or removing the flags indi-
cating the gap. A variant of the phylogeny-aware algorithm, known
as PRANK +F , uses this information to mark sites at which the flagged
Search WWH ::




Custom Search