Dynamic Programming - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

diagonal from the source to the target cell. The width of this

diagonal band is determined by the tolerance of the number of

insertions and deletions allowed. Obviously, the number of cell

scores to be computed is proportional to this bandwidth as well as

the sequence length. A narrow bandwidth offers a quick solution,

but it may miss the higher scoring paths beyond the search band.

Combining banded dynamic programming solutions with

other variations such as Hirschberg's algorithm is applicable and

used in some early sequence alignment programs [ 16 ].

Selecting a subset from the alignment grid using a constant band is

simple but it has some serious drawbacks. An optimal alignment

might follow a path off the banded coverage and there is no

effective way of mitigating such a problem for a static method.

Adapting the search space by including potentially optimal paths

in the search space is capable of handling a more general class of

alignment problems. Instead of limiting the search space using

spatial bounds, alignment cell scores determine the boundaries.

The potential search space is explored by enabling the children of

cells with scores higher than an upper bound, and this methodol-

ogy is generally called as bounded dynamic programming [ 17 ].

Assume we compute the score S ( i , j ). For a given threshold

3.3.2 Bounded Dynamic

Programming

the children of c ( i , j ) are included in the search space if S ( i , j )+ P

( i , j )

. Here, P ( i , j ) is an estimation of the optimal path score

from c ( i , j ) to the ending corner. In the ideal case, if we had the

exact values of P ( i , j ), the search space would be greatly reduced

which only consist of the near-optimal paths with scores greater

than

> τ

. Good approximations of the matrix P and the threshold

value can provide decent reduction in the space and time complex-

ity. The approximation of the remaining path score is a function of

the distance of c ( i , j ) to the lower-right corner. As an example, we

can choose an upper approximation with the scenario that the path

returns to the diagonal (i.e., gaps) and travels to the destination on

the diagonal (i.e., matches). The corresponding score is

;

Þ ¼ jðj

Þðj

Þj

gap penalty

min

fðj

Þ; ðj

Þg

match score

(14)

In Fig. 8 , the alignment matrix of human and gorilla mitochondrial

genomes are shown. The red line shows the paths with the optimal

alignment, where the black region is the section selected by the

thresholding described. Bounded dynamic programming in this

example finds the exact solution with 60 % less number of cell

computations (i.e., 2.5 times faster).

Determining the bounding threshold

is difficult, and the

complexity of the algorithm is sensitive on this threshold. A large

threshold includes large number of cells in the search space, while

a low threshold prunes large number of cells and it cannot return a

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home