Biomedical Engineering Reference
In-Depth Information
Multiple Sequence Alignment
Applications of multiple sequence alignment—aligning three or more sequences—range from
suggesting homologous relationships between several proteins to predicting probes for other
members of the same family of similar sequences in a proteome. Although multiple sequence
alignment can be performed on nucleotide sequences, it's more often performed on polypeptide
sequences, and draws upon many of the techniques used for single pairwise sequence alignment. In
addition, several novel methods have been devised to deal with the challenge of aligning multiple
sequences. An overview of several key multiple sequence alignment technologies follows.
Dynamic Programming
Dynamic programming methods used for pairwise sequence alignment are easily extended to
encompass multiple sequences, at least in theory. Algorithmically, there is little difference between a
two- or three-dimensional alignment problem, as discussed earlier. However, the computational
requirements for 10 or more relatively short polypeptide sequences are beyond the reach of most
research laboratories. Three- or four-sequence alignment is the limit for workstation-class hardware.
As a result, for desktop work in multiple sequence alignment, several heuristic methods have been
developed that provide results in reasonable time, even though it's usually impossible to prove that
the results achieved through these methods are the best attainable.
Progressive Strategies
Progressive strategies take the salami-slice approach to multiple sequence alignment. Instead of
addressing the multidimensional problem head-on, progressive strategies break the multiple
sequence alignment challenge into a series of pairwise alignment problems. The first pair of
sequences is aligned, and then that result is aligned with the third sequence, and so on, aligning each
subsequent search with the previous alignment. Alternatively, the first pair of sequences can serve as
the basis for aligning all subsequent sequences, which are then combined at the end of the process.
Other alignment schemes are possible as well.
For example, the approach used by the PILEUP program is to start with pairwise alignments that
score the similarity between every possible pair of sequences. These similarity scores are used to
define the order of alignment. That is, PILEUP first aligns the two most-related sequences to each
other in order to produce the first alignment. It then aligns the next most-related sequence to this
alignment or the next two most-related sequences to each other in order to produce another
alignment. A series of such pairwise alignments that includes increasingly dissimilar sequences and
alignments of sequences at each iteration eventually creates the final alignment.
The problem with progressive methods is that the validity of the result varies greatly as a function of
the order in which pairs of sequences are aligned. Errors in the earlier alignments are propagated to
the later alignments. In addition, progressive strategies are a heuristic approach in that they don't
necessarily return best possible alignment. In addition to PILEUP, programs that use a progressive
strategy are CLUSTALW, CLUSTALX, MSA, and PRALINE.
Iterative Strategies
Because of the limitation of progressive strategies due to sensitivity to errors introduced by early
alignments, iterative methods have been developed that correct for the problem by repeatedly
realigning subgroups of sequences. Iterative methods include the use of genetic algorithms and
HMMs.
Approaches based on genetic algorithms generally start with a random definition of gap insertions
and deletions and use the alignment score as the fitness function. The pattern that defines the gaps
 
 
Search WWH ::




Custom Search