Multiple Sequence Alignment with DIALIGN - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

If large sequences are to be aligned, program run time becomes

an issue. Similar as with more traditional alignment methods, the

run time of DIALIGN for pairwise alignment is proportional to the

product of the lengths of the input sequences [ 16 ]. This is too slow

to align large genomic sequences. To speed up the program run

time, a previously developed anchoring option proved to be useful.

2 Anchored Alignment

Most MSA methods are fully automated and do not require any

human intervention. The input from the user is restricted to select-

ing a set of input sequences and to choose the necessary parameter

values, e.g., for gap penalties . In most cases, default parameter

values are used which have been found useful in the typical situa-

tions where a program is used.

Automated alignment is clearly required where no further

information about the input sequences is available. Also, if large

data sets are to be processed and manual intervention would be too

time consuming, automated MSA is mandatory. It should be clear,

however, that the accuracy of automatic methods for sequence

analysis is fundamentally limited. At best, they can produce align-

ments with a (near-)optimal quality score in some mathematical

sense. But there can be no guarantee that mathematically optimal

or high-scoring alignments are biologically meaningful.

The standard version of DIALIGN is fully automated, i.e. like

other MSA methods, it works without human intervention. The

only input parameter is a threshold T for the quality of the local

similarities considered for alignment. Often, however, an expert

user has already some information about (putative) homologies

among the input sequences. In this case, it is desirable to force an

MSA program to align these homologies and to align only the

remainder of the sequences in the usual automatic fashion.

For this reason, DIALIGN has an option for anchored align-

ment where MSAs are produced in a semi-automatic way [ 17 , 18 ].

With this option, the user can select parts of the input sequences

that are to be aligned to each other. The final alignment produced

by DIALIGN can then be seen as an extension of this user-specified

alignment anchor . In more detail, the user selects equal-length pairs

of sequence segments that will end up aligned to each other with-

out gaps. Such pairs of segments are called anchor points . In gen-

eral, it may not be possible to align all of the specified anchor points

in one single output alignment, so it may be necessary to discard

some of the user-defined anchor points. Therefore, the user has to

assign score to each anchor point determining their priority in case

not all anchor points can be used.

In addition to including expert knowledge in otherwise auto-

matically produced MSAs,

the anchored-alignment option in

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home