Biology Reference
In-Depth Information
The dynamic programming scheme and its modifications we have
introduced to solve global, semi-global, local and near-optimal
sequence alignments require quadratic space and cubic time as a
function of sequence length. It has been shown that with an appro-
priate change in the alignment scoring method the complexity can
be reduced to quadratic time. Additional to this improvement, the
problem can be solved exactly in linear space. Such a space-time
reduction is a very significant gain in practical terms. Considering
the length of the biological sequences, the improvement means
about 100-1,000 folds speed ups, and the same rate of less memory
usage for pairwise alignments. The gain is more dramatic for multi-
ple sequence alignment programs that could range between tens of
thousands to million-fold. Many modern sequence alignment pro-
grams use the modified version of the algorithm based on the same
improvement principles.
Here, we will briefly see two fundamental performance
improvement modifications, one reducing the time complexity
from cubic to quadratic and the other providing linear-space solu-
tions. Gotoh showed that changing the gap penalty rules to affine
gap penalties, the score computation can be performed in
3.2 Algorithmic
Improvements on the
Dynamic Programming
Methods
n 2 )
[ 8 ]. Later, by incorporating Hirschberg's method [ 9 ] to the
sequence alignment algorithm, it was shown that saving the full
matrix of optimal path scores is not necessary in order to perform
the same computation.
The concave gap penalty functions used in the original Needle-
man-Wunsch algorithm assign an individual score to each consec-
utive gap of different length. Therefore a cell c ( i , j ) has i + j parents
and the score of each one has to be evaluated and considered
in computing S ( i , j ) (e.g., Eq. 6 ). This concave function can be
defined
3.2.1 Quadratic-Time
Alignment by Affine Gap
Penalties
in
the
form of
a
piecewise
linear
function
g
a , where a is the penalty for an
introduction of new gap and b is the penalty for each residue
extension of a gap. In this form of constant score increase for
consecutive gaps, the total gap penalty can be computed recursively
instead of performing a gap score computation for each parent.
Consider the second and the third computation lines in the recur-
sive function Eq. 6 . The gap penalty scores
ð
k
Þ¼
a
þ
b
ð
k
1
Þ
,0
<
b
<
e i;j ð
i
p
;
j
Þ¼
e i;j
by definition. The scores of
the vertical and the horizontal edges are then
ð
i
;
j
p
Þ¼
g
ð
p
Þ¼
a
þ
b
ð
p
1
Þ
V
ð
i
;
j
Þ¼
max
p
S
ð
i
p
;
j
Þþ
g
ð
p
Þ;
0
<
p
i
(9)
H
ð
i
;
j
Þ¼
max
q
S
ð
i
;
j
q
Þþ
g
ð
q
Þ;
0
<
q
j
:
(10)
Search WWH ::




Custom Search