Biology Reference
In-Depth Information
the dynamic programming is formulated as follows. The optimal
path to the cell c ( i , j ) is the best path among the combinations
of the optimal path from the upper-left cell to the parents of c ( i , j )
and the edge between the parent and c ( i , j ). From the alignment
graph, we can see that the parents of the cell c ( i , j ) are the cell
c
ð
i
1
;
j
1
Þ
and the cells c ( p , j ), p
<
i , and c ( i , q ), q
<
j . The
recursive function turns out to be
8
<
S
ð
i
1
;
j
1
Þþ
e i;j ð
i
1
;
j
1
Þ
max
p
S
ð
i
p
;
j
Þþ
e i;j
ð
i
p
;
j
Þ
0
<
p
i
S
ð
i
;
j
Þ¼
max
(6)
:
max
q
S
ð
i
;
j
q
Þþ
e i;j
ð
i
;
j
q
Þ
0
<
q
j
S ( i , j ) denotes the maximal path score from the upper-left corner to
the cell c ( i , j ) and e i , j ( x , y ) is the score of the edge from the parent
cell c ( x , y )to c ( i , j ).
A standard approach to calculate the best alignment score is to
start the recursion from the upper left corner and calculate S ( i , j )by
scanning the matrix line by line, i.e., incrementing i and j in a
double loop. At the end of
j
X
jj
Y
j
(the length of the sequences)
iterations, S (
) is calculated as the optimal alignment score.
This matrix scan constitutes the first phase of the algorithm. In a
second reverse scan, starting from the cell (
j
X
j
,
j
Y
j
), the edges
leading to the cell c ( i , j ) are traced back by finding the parent cell
using the score change. The reverse-scan procedure records the
sequence of the edges in the optimal path and the alignment is
recovered from this sequence.
As an example, we can apply Needleman-Wunsch algorithm
to the DNA sequences X
j
X
j
,
j
Y
j
GAGACAT with
the scoring scheme in which the same residues are awarded with +2
score, different residues are punished by
¼
GACAT and Y
¼
0.5 score and the
gaps have the penalty function
.
Figure 5 shows the resulting cell scores calculated by the procedure,
and the corresponding path recovered by tracing back from the
lower-right cell.
It is notable that starting the bottom-up recursive algorithm
from the top-left corner, or from the bottom-right corner using the
dual graph finds the same alignment as the directionality will not
alter the problem. In fact in the original Needleman-Wunsch algo-
rithm the recursion was applied in the reverse direction.
More than one optimal alignment might exist as multiple paths
on the grid can have the same score. In multiple optimal alignment
case, the reverse-scan phase might select one of the paths randomly
as it detects multiple parents leading to the same score, or an extra
backtracing thread can be generated at each such instance, resulting
in exploring all optimal alignments.
Needleman-Wunsch algorithm requires
f
1
;
1
:
1
;
1
:
11
;
1
:
111
; ...g
n 2 ))
space since it records the optimal alignment score for every cell.
j
X
jj
Y
j
(
Search WWH ::




Custom Search