Dynamic Programming - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

M

2 matrices holding S R ( i , j ) scores is called to sum up

the pairwise scores for the coordinate pairs of c

ð

M

þ

1

Þ=

. This

approximation can be extended from pair scores to score combina-

tions of multiple alignment N -tuples ( N

ð

k 1

;

k 2

; ...;

k M

Þ

<

M ). While estimating

P

frommultiple alignment scores is a better approx-

imation, calculating multiple alignment combinations of N

sequences might be costly. Nevertheless, using triplet scores is

reported to be more efficient than the using pairwise alignments

when the overall procedure is considered for a few sequences [ 26 ].

The threshold determining the search subset can be done by itera-

tive deepening as in the case of pairwise algorithm, or a fast heur-

istics approximating the multiple alignment (e.g., progressive

alignment) can be used [ 27 ].

Further reduction in space complexity is achieved by combin-

ing bounded dynamic programming with Hirschberg's insight.

The corresponding method is usually referred to as the divide-

and-conquer frontier search [ 26 ]. The frontier is the active region

of the search space that is used in the computations. Since the

grandparents of the cells are not used in the score calculation, a

cell is not kept in the memory after the scores of its children are

computed. Obviously, the memory to be used in the backtrace

phase is lost. Hirschberg's algorithm keeps the ancestors of each

active cell located in a selected median hyperplane of the grid. The

ancestor list is updated at each cell computation. Finally, the ances-

tor of the target cell is determined to be the point where the

optimal path crosses the hyperplane. The problem is then split

into two subproblems as in the pairwise application, and the opti-

mal path is found recursively.

ð

k 1 ;

k 2 ; ...;

k M Þ

Divide-and-conquer methods have the similar subproblem splitting

insight with Hirschberg's algorithm, but they differ in how the split

points are found. The sequences are parsed with fast methods

avoiding dynamic programming in multidimensional grid. Once

a number of subproblems are defined, each problem is solved by a

multiple sequence alignment algorithm of choice.

Some methods determine the binary split points directly using

pairwise alignments [ 28 ]. Combining the forward and the back-

ward alignment matrices S F , and S R the optimal split of the prefix of

the suffix of a sequence be found as in pairwise Hirschberg's

algorithm. An approximate cut point is found for each sequence

by summing up the score matrices for each pairwise alignment

associated. The overall maximum score determines an average

split point. This procedure is repeated a few times and the

corresponding subproblems are solved conventionally. The final

multiple sequence alignment is simply the concatenation of all

subsolutions.

4.2 Divide-and-

Conquer Methods

in Multiple Sequence

Alignment

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home