Biology Reference
In-Depth Information
If large sequences are to be aligned, program run time becomes
an issue. Similar as with more traditional alignment methods, the
run time of
DIALIGN
for pairwise alignment is proportional to the
product of the lengths of the input sequences [
16
]. This is too slow
to align large genomic sequences. To speed up the program run
time, a previously developed
anchoring
option proved to be useful.
2 Anchored Alignment
Most MSA methods are fully automated and do not require any
human intervention. The input from the user is restricted to select-
ing a set of input sequences and to choose the necessary parameter
values, e.g., for
gap penalties
. In most cases, default parameter
values are used which have been found useful in the typical situa-
tions where a program is used.
Automated alignment is clearly required where no further
information about the input sequences is available. Also, if large
data sets are to be processed and manual intervention would be too
time consuming, automated MSA is mandatory. It should be clear,
however, that the accuracy of automatic methods for sequence
analysis is fundamentally limited. At best, they can produce align-
ments with a (near-)optimal
quality score
in some mathematical
sense. But there can be no guarantee that mathematically optimal
or high-scoring alignments are biologically meaningful.
The standard version of
DIALIGN
is fully automated, i.e. like
other MSA methods, it works without human intervention. The
only input parameter is a
threshold T
for the quality of the local
similarities considered for alignment. Often, however, an expert
user has already some information about (putative) homologies
among the input sequences. In this case, it is desirable to
force
an
MSA program to align these homologies and to align only the
remainder of the sequences in the usual automatic fashion.
For this reason,
DIALIGN
has an option for
anchored align-
ment
where MSAs are produced in a
semi-automatic
way [
17
,
18
].
With this option, the user can select parts of the input sequences
that are to be aligned to each other. The final alignment produced
by
DIALIGN
can then be seen as an
extension
of this user-specified
alignment
anchor
. In more detail, the user selects equal-length pairs
of sequence segments that will end up aligned to each other with-
out gaps. Such pairs of segments are called
anchor points
. In gen-
eral, it may not be possible to align all of the specified anchor points
in one single output alignment, so it may be necessary to discard
some of the user-defined anchor points. Therefore, the user has to
assign
score
to each anchor point determining their priority in case
not all anchor points can be used.
In addition to including expert knowledge in otherwise auto-
matically produced MSAs,
the anchored-alignment option in