Biology Reference
In-Depth Information
Chapter 9
Multiple Sequence Alignment Using Probcons and Probalign
Usman Roshan
Abstract
Sequence alignment remains a fundamental task in bioinformatics. The literature contains programs that
employ a host of exact and heuristic strategies available in computer science. Probcons was the first program
to construct maximum expected accuracy sequence alignments with hidden Markov models and at the time
of its publication achieved the highest accuracies on standard protein multiple alignment benchmarks.
Probalign followed this strategy except that it used a partition function approach instead of hidden Markov
models. Several programs employing both strategies have been published since then. In this chapter we
describe Probcons and Probalign.
Key words Sequence alignment, Expected accuracy, Hidden Markov models, Partition function
1
Introduction
Multiple protein sequence alignment is one of the most commonly
used tasks in bioinformatics [ 1 ]. It has widespread applications that
include detecting functional regions in proteins [ 2 ] and recon-
structing complex evolutionary histories [ 1 , 3 ]. Techniques for
constructing accurate alignments are therefore of great interest to
the bioinformatics community.
ClustalW [ 4 ] is one of the earliest multiple sequence aligners
and remains popular to date. Other programs include Dialign [ 5 ],
T-Coffee [ 6 ], MUSCLE [ 7 ], and MAFFT [ 8 ]. Given the impor-
tance of multiple sequence alignment, several protein alignment
benchmarks have been created for unbiased accuracy assessment of
alignment quality. Of these, BAliBASE [ 9 - 11 ] is by far the most
commonly used. The BAliBASE benchmark alignments are com-
puted using superimposition of protein structures.
Prior to Probcons [ 12 ] most programs optimized the sum-of-
pairs score of a multiple alignment or computed the Viterbi align-
ment [ 3 ]. Probcons computes the maximal expected accuracy
alignment instead. The expected accuracy of an alignment is
based upon posterior probabilities of
residues
[ 3 , 12 - 14 ].
Search WWH ::




Custom Search