Dynamic Programming - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

Seeding the dynamic program using consecutive regions of con-

served subsequences is directly generalized to multiple sequence

alignment problems. In the general form, fixed length of subse-

quences shared by all sequences are used as seed candidates and

significant clusters are selected as seeds [ 29 ]. Bounded dynamic

programming algorithms are employed to bridge the seeds. Greedy

versions of seeding by utilizing the conserved subsequences of

sequence pairs and assembling the alignment by combining pairs

progressively [ 30 ] can also be accepted as in this class multiple

alignment algorithms.

4.3 Use of Seeds

in Multiple Sequence

Alignment

When the nature of biological sequence alignment problem is

investigated, a tree-based alignment model appears to be realistic

as similar sequences are expected to evolve from a close common

ancestor. In such an approach a divergence from dynamic program-

ming in multidimensional grids is appropriate because sum-of-pairs

scoring scheme assigns equivalent importance to each sequence,

disregarding their evolutionary distances. The popularity of tree-

based alignment algorithms stems from this fact, as well as requir-

ing much less computations.

Progressive alignment programs generally have two sequential

steps. First, an evolutionary tree is estimated. Second, consensus

sequences are generated and updated iteratively using pairwise

alignments. Pairwise sequence alignment is the core procedure

employed in progressive alignment.

While pairwise alignment is the major part of the second phase,

it can also be employed in estimating the evolutionary tree. For

example, CLUSTAL series of programs [ 31 ] use linear-space

dynamic programming for building the similarity matrix to be

used in building the guide tree. Dynamic programming schemes

we have discussed are the progressive alignment algorithms

employed in successful multiple alignment programs such as

CLUSTALW, T-Coffee [ 32 ], MUSCLE [ 33 ], MAFFT [ 34 ], and

GramAlign [ 35 ].

4.4 Dynamic

Programming and

Progressive Sequence

Alignment

References

1. Waterman MS (1995) Introduction to compu-

tational biology. Chapman & Hall, London

2. Bellman R (1958) On a routing problem. Q

Appl Math 16:87-90

3. Fredman ML, Tarjan RE (1987) Fibonacci

heaps and their uses in improved network opti-

mization algorithms. J Assoc Comput Mach

3:596-615

4. Needleman SB, Wunsch CD (1970) A general

method applicable to the search for similarities

in the amino acid sequences of two proteins. J

Mol Biol 48:443-453

5. Waterman MS, Smith TF, Beyer WA (1976)

Some biological sequence metrics. Adv Math

20:367-387

6. Waterman MS, Smith TF (1981) Identification

of common molecular subsequences. J Mol

Biol 147:195-197

7. Waterman MS (1981) Sequence alignments in

the neighborhood of the optimum with gen-

eral application to dynamic programming. Proc

Natl Acad Sci USA 80:3123-3124

8. Gotoh O (1982) An improved algorithm for

matching biological sequences. J Mol Biol

162:705-708

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home