Dynamic Programming - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

Efficient algorithms employed in biological sequence alignment

have been around since the early 1970s. Over the decades, many

improvements and modifications have been achieved. However, the

main methodology enabling sequence alignment has stayed as the

kernel of the programs. Dynamic programming is an efficient com-

putation scheme applied on a class of problems that can be solved

recursively. Defining the alignment problem in such manner enables

the computation of optimal alignments in polynomial time and

memory for a pair of sequences.

Dynamic programming offers a feasible optimal solution for

pairwise alignments. However, an exact application becomes

impractical when multiple sequences are considered. A similar

practical issue is observed when queries are needed to be aligned

against large databases. Several heuristics are proposed for approxi-

mate solutions of the corresponding problems. Yet, the idea behind

most of the popular alignment heuristics is to approximate the

solution by dividing the problem into subpairwise alignment pro-

blems. In that sense, it is possible to say the idea of aligning two

sequences using dynamic programming is a main building block of

the multiple sequence alignment and database search algorithms.

In this chapter, we first review the pairwise sequence alignment

problem and the polynomial time solution offered by dynamic

programming. The minor variations on the algorithms to find

global, semi-global, local, and near-optimal alignments are intro-

duced. We discuss the fundamental algorithmic complexity reduc-

tions and heuristics to provide fast approximate solutions. The final

issue we consider is the relation of multiple sequence alignment and

dynamic programming.

2

The Pairwise Alignment Problem

To pose the problem and the solution program, perhaps the

introduction of pairwise sequence alignment problem is a prefera-

ble starting point because of a couple of reasons. Firstly, the

multiple sequence alignment is generally defined as a direct gener-

alization of pairwise sequence alignment. Therefore, the pairwise

solution offered can also be generalized to the multiple sequence

case. Secondly, the practical applications of multiple sequence align-

ment approximate the solution considering the pairwise alignment

of sequences. A multiple alignment is generally met by extending

the pairwise alignments to a consensus of multiple alignment.

An alignment of two biological sequences is simply an ordered

mapping. Let's say X and Y are two sequences and each element of

a sequence is called a residue. Assume the i th residue of X , X i ,is

paired with Y j . Then, a residue coming after X i cannot match up

with a residue of Y coming before Y j . Moreover, a residue can be

matched with at most one residue and the matchings are not

bijective. If a residue is not matched with another residue, it is

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home