Dynamic Programming - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

The dynamic programming scheme and its modifications we have

introduced to solve global, semi-global, local and near-optimal

sequence alignments require quadratic space and cubic time as a

function of sequence length. It has been shown that with an appro-

priate change in the alignment scoring method the complexity can

be reduced to quadratic time. Additional to this improvement, the

problem can be solved exactly in linear space. Such a space-time

reduction is a very significant gain in practical terms. Considering

the length of the biological sequences, the improvement means

about 100-1,000 folds speed ups, and the same rate of less memory

usage for pairwise alignments. The gain is more dramatic for multi-

ple sequence alignment programs that could range between tens of

thousands to million-fold. Many modern sequence alignment pro-

grams use the modified version of the algorithm based on the same

improvement principles.

Here, we will briefly see two fundamental performance

improvement modifications, one reducing the time complexity

from cubic to quadratic and the other providing linear-space solu-

tions. Gotoh showed that changing the gap penalty rules to affine

gap penalties, the score computation can be performed in

3.2 Algorithmic

Improvements on the

Dynamic Programming

Methods

n 2 )

[ 8 ]. Later, by incorporating Hirschberg's method [ 9 ] to the

sequence alignment algorithm, it was shown that saving the full

matrix of optimal path scores is not necessary in order to perform

the same computation.

Oð

The concave gap penalty functions used in the original Needle-

man-Wunsch algorithm assign an individual score to each consec-

utive gap of different length. Therefore a cell c ( i , j ) has i + j parents

and the score of each one has to be evaluated and considered

in computing S ( i , j ) (e.g., Eq. 6 ). This concave function can be

defined

3.2.1 Quadratic-Time

Alignment by Affine Gap

Penalties

the

form of

piecewise

linear

function

a , where a is the penalty for an

introduction of new gap and b is the penalty for each residue

extension of a gap. In this form of constant score increase for

consecutive gaps, the total gap penalty can be computed recursively

instead of performing a gap score computation for each parent.

Consider the second and the third computation lines in the recur-

sive function Eq. 6 . The gap penalty scores

Þ¼

e i;j ð

;

Þ¼

e i;j

by definition. The scores of

the vertical and the horizontal edges are then

;

Þ¼

;

Þ¼

max

;

Þþ

Þ;

(9)

;

Þ¼

max

;

Þþ

Þ;

(10)

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home