Multiple Sequence Alignment with DIALIGN - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

DIALIGN-T is a complete re-implementation of DIALIGN

[ 29 ]. As the first implementation of DIALIGN , it starts with calcu-

lating all pairwise alignments of the input sequences [ 16 , 30 ].

That is, an optimal chain of fragments is calculated for each pair

of input sequences. The difference to previous versions of the

program is in the way, these similarities are integrated into a final

multiple alignment. Like in the first implementation, a greedy

heuristic is used, but DIALIGN-T uses a various tricks to prevent

the algorithm from aligning spurious, isolated random similarities

which might prevent a greedy method from finding a biologically

correct global alignment. DIALIGN-T , for example, does not only

consider the local degree of similarity in a fragment, but also its

context within the two aligned sequences. Fragments that belong to

a high-scoring pairwise alignment are preferred to isolated frag-

ments. Together with some other heuristics, this led to a consider-

able improvement of the performance compared with the original

implementation of DIALIGN .

In DIALIGN-TX [ 31 ], more sophisticated methods were used

to reduce the influence of isolated local similarities. This implemen-

tation relies on the traditional progressive approach to multiple

alignment [ 1 - 3 ] and adapts this approach to the focus on local

similarities that is used in DIALIGN . In a first step, a guide tree is

calculated for the input sequences. This is done by transforming the

fragment-based similarities in the pairwise alignments into distance

values. As in more traditional progressive methods, sequences and

groups of previously aligned sequences are aligned, going from the

tips to the root of the guide tree.

In progressive methods such as CLUSTAL , a group of previ-

ously aligned sequences is represented as a profile , i.e. as a matrix

containing the residue frequencies for each alignment column. This

cannot be generalized to the segment-based approach where an

alignment is seen as a set of local homologies, and parts of the

sequences may remain unaligned. DIALIGN-TX therefore uses a

different approach to align two groups G 1 and G 2 of previously

aligned sequences. Fragments are selected, each of which aligns one

sequence from G 1 with another sequence from G 2 . To remove

fragments that are inconsistent with the previously selected frag-

ments aligning sequences from G 1 and G 2 , respectively, to each

other, a graph algorithm is used [ 32 ].

4 Using Matches to Pfam for Improved Protein Alignment

Traditional alignment approaches are based on primary-sequence

information only. In one way or the other, they define an alignment

score based on detectable primary-sequence similarity and then try

to calculate optimal or near-optimal alignments in the sense of this

scoring scheme. Such approaches are clearly reasonable where no

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home