Biology Reference
In-Depth Information
DIALIGN
can be used to speed up the alignment procedure.
Indeed, if an anchor point enforces alignment of two selected
sequence segments, this reduces the search space of the remaining
automatic alignment procedure (e.g., if the middle positions of two
sequences are used as anchor point, the search spaced for the
pairwise alignment is reduced by a factor of two).
Therefore, the
anchoring
option was also used to align long
genomic sequences [
19
,
20
]. Here, a fast method for local homol-
ogy detection such as
BLAST
[
21
] can be used to find strong
sequence homologies that can then be used as
anchor points
for
DIALIGN
. This approach has been implemented and made avail-
able on our web server [
19
]. Our anchored-alignment approach to
genomic sequence comparison has also been used to improve the
performance of
gene-finding
methods in eukaryotes [
22
]. Other
applications of anchored multiple alignment are the possibility to
study the behavior of alignment methods in detail, or the integra-
tion of new algorithmic approaches for multiple alignment instead
of the greedy heuristic used in the standard version of
DIALIGN
[
23
].
3 DIALIGN-T and DIALIGN-TX
Studies have shown that
DIALIGN
is often superior to other MSA
tools where sequences with local homologies are aligned. On glob-
ally related sequences with weak primary-sequence similarity, how-
ever, it tends to be outperformed by strictly global methods such as
CLUSTAL W
[
24
],
MUSCLE
[
5
,
25
],
MAFFT
[
4
], or
PROB-
CONS
[
26
]. One might think that a possible reason for this relative
weakness could be the greedy optimization method used for multi-
ple alignment in
DIALIGN
. Indeed, it is easy to see that the
heuristic in
DIALIGN
can produce MSAs with scores far below
the possible optimal MSA. If that would be the reason for the
relative weakness of the program on global, weak homologies,
one would make efforts to find more efficient optimization algo-
rithms, leading to higher-scoring MSAs in the sense of the
fragment-based scoring function used in
DIALIGN
. This has
been done in the past, e.g., in [
27
,
28
]. More recent results based
on anchored alignments indicate, however, that the relative weak-
ness of
DIALIGN
on global homologies with low similarity at the
primary-sequence level is caused by the underlying objective func-
tion, and not so much by the greedy optimization algorithm. Thus,
MSAs with mathematically higher scores may not necessarily be
more meaningful of biologically. We therefore adopted other
approaches to improve the performance of
DIALIGN
on those
sequence families where strictly global MSA methods were still
superior. This resulted in the development of
DIALIGN-T
and
DIALIGN-TX
.