Biology Reference
In-Depth Information
DIALIGN-T is a complete re-implementation of DIALIGN
[ 29 ]. As the first implementation of DIALIGN , it starts with calcu-
lating all pairwise alignments of the input sequences [ 16 , 30 ].
That is, an optimal chain of fragments is calculated for each pair
of input sequences. The difference to previous versions of the
program is in the way, these similarities are integrated into a final
multiple alignment. Like in the first implementation, a greedy
heuristic is used, but DIALIGN-T uses a various tricks to prevent
the algorithm from aligning spurious, isolated random similarities
which might prevent a greedy method from finding a biologically
correct global alignment. DIALIGN-T , for example, does not only
consider the local degree of similarity in a fragment, but also its
context within the two aligned sequences. Fragments that belong to
a high-scoring pairwise alignment are preferred to isolated frag-
ments. Together with some other heuristics, this led to a consider-
able improvement of the performance compared with the original
implementation of DIALIGN .
In DIALIGN-TX [ 31 ], more sophisticated methods were used
to reduce the influence of isolated local similarities. This implemen-
tation relies on the traditional progressive approach to multiple
alignment [ 1 - 3 ] and adapts this approach to the focus on local
similarities that is used in DIALIGN . In a first step, a guide tree is
calculated for the input sequences. This is done by transforming the
fragment-based similarities in the pairwise alignments into distance
values. As in more traditional progressive methods, sequences and
groups of previously aligned sequences are aligned, going from the
tips to the root of the guide tree.
In progressive methods such as CLUSTAL , a group of previ-
ously aligned sequences is represented as a profile , i.e. as a matrix
containing the residue frequencies for each alignment column. This
cannot be generalized to the segment-based approach where an
alignment is seen as a set of local homologies, and parts of the
sequences may remain unaligned. DIALIGN-TX therefore uses a
different approach to align two groups G 1 and G 2 of previously
aligned sequences. Fragments are selected, each of which aligns one
sequence from G 1 with another sequence from G 2 . To remove
fragments that are inconsistent with the previously selected frag-
ments aligning sequences from G 1 and G 2 , respectively, to each
other, a graph algorithm is used [ 32 ].
4 Using Matches to Pfam for Improved Protein Alignment
Traditional alignment approaches are based on primary-sequence
information only. In one way or the other, they define an alignment
score based on detectable primary-sequence similarity and then try
to calculate optimal or near-optimal alignments in the sense of this
scoring scheme. Such approaches are clearly reasonable where no
Search WWH ::




Custom Search