Dynamic Programming - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

the dynamic programming is formulated as follows. The optimal

path to the cell c ( i , j ) is the best path among the combinations

of the optimal path from the upper-left cell to the parents of c ( i , j )

and the edge between the parent and c ( i , j ). From the alignment

graph, we can see that the parents of the cell c ( i , j ) are the cell

;

and the cells c ( p , j ), p

i , and c ( i , q ), q

j . The

recursive function turns out to be

;

Þþ

e i;j ð

;

max

;

Þþ

e i;j

;

Þ¼

max

(6)

max

;

Þþ

e i;j

;

S ( i , j ) denotes the maximal path score from the upper-left corner to

the cell c ( i , j ) and e i , j ( x , y ) is the score of the edge from the parent

cell c ( x , y )to c ( i , j ).

A standard approach to calculate the best alignment score is to

start the recursion from the upper left corner and calculate S ( i , j )by

scanning the matrix line by line, i.e., incrementing i and j in a

double loop. At the end of

(the length of the sequences)

iterations, S (

) is calculated as the optimal alignment score.

This matrix scan constitutes the first phase of the algorithm. In a

second reverse scan, starting from the cell (

), the edges

leading to the cell c ( i , j ) are traced back by finding the parent cell

using the score change. The reverse-scan procedure records the

sequence of the edges in the optimal path and the alignment is

recovered from this sequence.

As an example, we can apply Needleman-Wunsch algorithm

to the DNA sequences X

GAGACAT with

the scoring scheme in which the same residues are awarded with +2

score, different residues are punished by

GACAT and Y

0.5 score and the

gaps have the penalty function

Figure 5 shows the resulting cell scores calculated by the procedure,

and the corresponding path recovered by tracing back from the

lower-right cell.

It is notable that starting the bottom-up recursive algorithm

from the top-left corner, or from the bottom-right corner using the

dual graph finds the same alignment as the directionality will not

alter the problem. In fact in the original Needleman-Wunsch algo-

rithm the recursion was applied in the reverse direction.

More than one optimal alignment might exist as multiple paths

on the grid can have the same score. In multiple optimal alignment

case, the reverse-scan phase might select one of the paths randomly

as it detects multiple parents leading to the same score, or an extra

backtracing thread can be generated at each such instance, resulting

in exploring all optimal alignments.

Needleman-Wunsch algorithm requires

;

111

; ...g

n 2 ))

space since it records the optimal alignment score for every cell.

Oð

(

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home