Dynamic Programming - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

that the scores can be computed in sequence-length-size memory,

Hirschberg's algorithm can be run in linear space.

At the implementation level, different versions of the linear-

space algorithm are available [ 11 , 12 ]. As a variation, the median

edge can be found by only using a forward scan of score computa-

tion [ 13 ]. In this version, the forward score computation continues

when the median line is reached. For each cell in the

cþ

1th

line, S F

are computed and the median edges result-

ing in each one of these scores are recorded. Clearly an additional

sequence-length memory space is needed for this ancestral records.

At iteration k , for each cell in the

ðbj

cþ

;

th line, the cell

visited by the optimal path in the previous line is determined by

the score computation. Therefore, the median edge for the current

cell can be updated at each iteration. At the final iteration, the

median edge assigned to the lower-right cell is determined to be

the median edge belonging to the solution path.

The linear-space algorithm using median edge candidate

records provides a space-time trade-off flexibility. The algorithm

can be generalized to m splits of subproblems instead of binary

splits at each recursion. Clearly, keeping the records of the edges

crossing the

ðbj

cþ

th line, we can detect the edge belonging to

the optimal path at that portion. In a forward score computation

pass, we can keep records of such edges for every

/ m

lines

). This would require m sequence size memory

spaces. At each iteration m mid-edges are discovered, and m sub-

problems are defined. For iteration k , m k subproblems are defined

and each problem requires ( m k ) 2 fold less computation than the

original problem. In this case, the amount of computation is

¼ P k¼ 0 S =

3 k

. This space-time trade-

off might be important in practical applications.

As we have seen the

¼ S þ S =ð

Hirschberg

n 2 ) time and

n ) space algorithms are

available by affine gap penalties and Hirschberg's algorithm for

global alignments, the semi-global alignment extension is straight-

forward. Initializing the first row and columns of S , V , and H

matrices and defining g ( p )

Oð

0 for the last rows and columns is

sufficient. On the other hand, additional requirements are needed

to perform local alignments in linear time [ 14 , 15 ]. For calculating

the scores, the same recursive function equation 12 with the

Smith-Waterman modification is used. Here the scoring modifica-

tion is setting the negative scores to zero. The linear-space scoring

memory is the same as the global alignment case, and the main

difference is a root cell is recorded for computed cell c ( i , j ). Recall

that the Smith-Waterman algorithm terminates a local alignment

when a zero-scored ending cell is met during the backward scan

phase. The root of c ( i , j ) is simply the zero-scored starting point of

the optimal path passing through c ( i , j ). The root is empty when

S ( i , j )

0 and it is initialized when a cell attains a positive score

with an edge coming from a zero-scored parent. This root is

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home