Biology Reference
In-Depth Information
Seeding the dynamic program using consecutive regions of con-
served subsequences is directly generalized to multiple sequence
alignment problems. In the general form, fixed length of subse-
quences shared by all sequences are used as seed candidates and
significant clusters are selected as seeds [ 29 ]. Bounded dynamic
programming algorithms are employed to bridge the seeds. Greedy
versions of seeding by utilizing the conserved subsequences of
sequence pairs and assembling the alignment by combining pairs
progressively [ 30 ] can also be accepted as in this class multiple
alignment algorithms.
4.3 Use of Seeds
in Multiple Sequence
Alignment
When the nature of biological sequence alignment problem is
investigated, a tree-based alignment model appears to be realistic
as similar sequences are expected to evolve from a close common
ancestor. In such an approach a divergence from dynamic program-
ming in multidimensional grids is appropriate because sum-of-pairs
scoring scheme assigns equivalent importance to each sequence,
disregarding their evolutionary distances. The popularity of tree-
based alignment algorithms stems from this fact, as well as requir-
ing much less computations.
Progressive alignment programs generally have two sequential
steps. First, an evolutionary tree is estimated. Second, consensus
sequences are generated and updated iteratively using pairwise
alignments. Pairwise sequence alignment is the core procedure
employed in progressive alignment.
While pairwise alignment is the major part of the second phase,
it can also be employed in estimating the evolutionary tree. For
example, CLUSTAL series of programs [ 31 ] use linear-space
dynamic programming for building the similarity matrix to be
used in building the guide tree. Dynamic programming schemes
we have discussed are the progressive alignment algorithms
employed in successful multiple alignment programs such as
CLUSTALW, T-Coffee [ 32 ], MUSCLE [ 33 ], MAFFT [ 34 ], and
GramAlign [ 35 ].
4.4 Dynamic
Programming and
Progressive Sequence
Alignment
References
1. Waterman MS (1995) Introduction to compu-
tational biology. Chapman & Hall, London
2. Bellman R (1958) On a routing problem. Q
Appl Math 16:87-90
3. Fredman ML, Tarjan RE (1987) Fibonacci
heaps and their uses in improved network opti-
mization algorithms. J Assoc Comput Mach
3:596-615
4. Needleman SB, Wunsch CD (1970) A general
method applicable to the search for similarities
in the amino acid sequences of two proteins. J
Mol Biol 48:443-453
5. Waterman MS, Smith TF, Beyer WA (1976)
Some biological sequence metrics. Adv Math
20:367-387
6. Waterman MS, Smith TF (1981) Identification
of common molecular subsequences. J Mol
Biol 147:195-197
7. Waterman MS (1981) Sequence alignments in
the neighborhood of the optimum with gen-
eral application to dynamic programming. Proc
Natl Acad Sci USA 80:3123-3124
8. Gotoh O (1982) An improved algorithm for
matching biological sequences. J Mol Biol
162:705-708
Search WWH ::




Custom Search