Biology Reference
In-Depth Information
extended to N
2 sequences. Such well-conserved local MSA
segments can work as anchor points, by which the expensive MSA
routines can be confined in the narrower range between neighbor-
ing anchors or sequence ends, leading to great reduction in both
computational time and space.
Anchoring is a virtually indispensable strategy in genome
sequence alignment. In one of the first attempts to align whole
genomes, Delcher et al. [ 77 ] introduced the concept of maximal
unique match (MUM), defined as a locally longest subsequence
that occurs exactly once in the sequences under comparison. MUM
anchoring is used in a local multiple genome alignment program,
Mauve [ 78 ]. Instead of MUMs, several genomic MSA programs,
such as MAG [ 79 ], GAME [ 80 ], and MISHIMA [ 81 ], use maximal
exact matches (MEMs) that are locally longest common subse-
quences without the uniqueness condition of MUM. MUMs and
MEMs can be efficiently detected by a suffix tree, suffix array, or
simple k -mer table [ 82 ]. Simultaneous occurrence of MUMs or
MEMs in multiple sequences is often too stringent a condition. To
improve the sensitivity, MLAGAN [ 83 ] and MAVID [ 84 ] rely on
the standard progressive/iterative strategy, where anchoring is used
at the pairwise alignment phase, as implemented in GLASS [ 85 ],
LAGAN [ 83 ], and AVID [ 86 ].
Anchoring also plays the essential role in segment-based MSA
methods, such as DIALIGN [ 87 ] and SeqAn::T-Coffee [ 88 ].
A segment means a locally best matched alignment typically found
by the Smith-Waterman-Gotoh algorithm [ 10 , 89 ], and thus may
contain internal mismatches and indels unlike MUM or MEM.
Those segments are added in the order of their alignment scores
to build up an MSA. The most critical issue in this process is to keep
consistency in the broader sense between the new segment and
those already incorporated in the MSA under construction. Here,
the consistency in the broader sense implies a right positional order
(partial order) between independently obtained segments. This
type of consistency also plays the pivotal role in the greedy MSA
construction algorithm called sequence annealing [ 20 , 90 ] and
related strategy adopted in PicXAA [ 91 ], in which an MSA column
or residue pair, instead of a segment, is the building block of MSA.
A directed acyclic graph is used to encode partially ordered building
blocks and to facilitate efficient check of consistency among them.
The initial motivation of the study on consistency in the
narrower sense [ 21 ] was to find anchor points to accelerate overall
MSA computation. Instead of consistency, however, several
contemporary MSA programs, such as PRRN/PRRP [ 52 ],
RASCAL [ 92 ], MAFFT [ 65 ], and MUSCLE [ 41 ], rely on local
alignment scores to heuristically identify anchor points in the itera-
tive refinement phase, at an expense of slight loss of alignment
quality [ 93 ].
>
Search WWH ::




Custom Search