Biology Reference
In-Depth Information
diagonal from the source to the target cell. The width of this
diagonal band is determined by the tolerance of the number of
insertions and deletions allowed. Obviously, the number of cell
scores to be computed is proportional to this bandwidth as well as
the sequence length. A narrow bandwidth offers a quick solution,
but it may miss the higher scoring paths beyond the search band.
Combining banded dynamic programming solutions with
other variations such as Hirschberg's algorithm is applicable and
used in some early sequence alignment programs [ 16 ].
Selecting a subset from the alignment grid using a constant band is
simple but it has some serious drawbacks. An optimal alignment
might follow a path off the banded coverage and there is no
effective way of mitigating such a problem for a static method.
Adapting the search space by including potentially optimal paths
in the search space is capable of handling a more general class of
alignment problems. Instead of limiting the search space using
spatial bounds, alignment cell scores determine the boundaries.
The potential search space is explored by enabling the children of
cells with scores higher than an upper bound, and this methodol-
ogy is generally called as bounded dynamic programming [ 17 ].
Assume we compute the score S ( i , j ). For a given threshold
3.3.2 Bounded Dynamic
Programming
,
the children of c ( i , j ) are included in the search space if S ( i , j )+ P
( i , j )
τ
. Here, P ( i , j ) is an estimation of the optimal path score
from c ( i , j ) to the ending corner. In the ideal case, if we had the
exact values of P ( i , j ), the search space would be greatly reduced
which only consist of the near-optimal paths with scores greater
than
> τ
. Good approximations of the matrix P and the threshold
value can provide decent reduction in the space and time complex-
ity. The approximation of the remaining path score is a function of
the distance of c ( i , j ) to the lower-right corner. As an example, we
can choose an upper approximation with the scenario that the path
returns to the diagonal (i.e., gaps) and travels to the destination on
the diagonal (i.e., matches). The corresponding score is
τ
P
ð
i
;
j
Þ ¼ jðj
X
j
i
Þðj
Y
j
j
Þj
gap penalty
þ
min
fðj
X
j
i
Þ; ðj
Y
j
j
Þg
match score
:
(14)
In Fig. 8 , the alignment matrix of human and gorilla mitochondrial
genomes are shown. The red line shows the paths with the optimal
alignment, where the black region is the section selected by the
thresholding described. Bounded dynamic programming in this
example finds the exact solution with 60 % less number of cell
computations (i.e., 2.5 times faster).
Determining the bounding threshold
is difficult, and the
complexity of the algorithm is sensitive on this threshold. A large
threshold includes large number of cells in the search space, while
a low threshold prunes large number of cells and it cannot return a
τ
Search WWH ::




Custom Search