Biology Reference
In-Depth Information
DIALIGN can be used to speed up the alignment procedure.
Indeed, if an anchor point enforces alignment of two selected
sequence segments, this reduces the search space of the remaining
automatic alignment procedure (e.g., if the middle positions of two
sequences are used as anchor point, the search spaced for the
pairwise alignment is reduced by a factor of two).
Therefore, the anchoring option was also used to align long
genomic sequences [ 19 , 20 ]. Here, a fast method for local homol-
ogy detection such as BLAST [ 21 ] can be used to find strong
sequence homologies that can then be used as anchor points for
DIALIGN . This approach has been implemented and made avail-
able on our web server [ 19 ]. Our anchored-alignment approach to
genomic sequence comparison has also been used to improve the
performance of gene-finding methods in eukaryotes [ 22 ]. Other
applications of anchored multiple alignment are the possibility to
study the behavior of alignment methods in detail, or the integra-
tion of new algorithmic approaches for multiple alignment instead
of the greedy heuristic used in the standard version of DIALIGN
[ 23 ].
3 DIALIGN-T and DIALIGN-TX
Studies have shown that DIALIGN is often superior to other MSA
tools where sequences with local homologies are aligned. On glob-
ally related sequences with weak primary-sequence similarity, how-
ever, it tends to be outperformed by strictly global methods such as
CLUSTAL W [ 24 ], MUSCLE [ 5 , 25 ], MAFFT [ 4 ], or PROB-
CONS [ 26 ]. One might think that a possible reason for this relative
weakness could be the greedy optimization method used for multi-
ple alignment in DIALIGN . Indeed, it is easy to see that the
heuristic in DIALIGN can produce MSAs with scores far below
the possible optimal MSA. If that would be the reason for the
relative weakness of the program on global, weak homologies,
one would make efforts to find more efficient optimization algo-
rithms, leading to higher-scoring MSAs in the sense of the
fragment-based scoring function used in DIALIGN . This has
been done in the past, e.g., in [ 27 , 28 ]. More recent results based
on anchored alignments indicate, however, that the relative weak-
ness of DIALIGN on global homologies with low similarity at the
primary-sequence level is caused by the underlying objective func-
tion, and not so much by the greedy optimization algorithm. Thus,
MSAs with mathematically higher scores may not necessarily be
more meaningful of biologically. We therefore adopted other
approaches to improve the performance of DIALIGN on those
sequence families where strictly global MSA methods were still
superior. This resulted in the development of DIALIGN-T and
DIALIGN-TX .
Search WWH ::




Custom Search