Biology Reference
In-Depth Information
Chapter 14
Multiple Protein Sequence Alignment with MSAProbs
Yongchao Liu and Bertil Schmidt
Abstract
Multiple sequence alignment (MSA) generally constitutes the foundation of many bioinformatics studies
involving functional, structural, and evolutionary relationship analysis between sequences. As a result of the
exponential computational complexity of the exact approach to producing optimal multiple alignments,
the majority of state-of-the-art MSA algorithms are designed based on the progressive alignment heuristic.
In this chapter, we outline MSAProbs, a parallelized MSA algorithm for protein sequences based on
progressive alignment. To achieve high alignment accuracy, this algorithm employs a hybrid combination
of a pair hidden Markov model and a partition function to calculate posterior probabilities. Furthermore,
we provide some practical advice on the usage of the algorithm.
Key words Multiple sequence alignment, Progressive alignment, Hidden Markov models, Partition
function, Consistency-based scheme
1
Introduction
Multiple sequence alignment (MSA) is fundamental to many bio-
informatics analysis studies that involve analyzing functional, struc-
tural, and evolutionary relationships between sequences. The exact
approach to producing optimal MSAs relies on exhaustive dynamic
programming. However, this approach has an exponential compu-
tational complexity and thus prohibits its use for large-scale data
analysis. Therefore, many heuristics have been proposed to acceler-
ate the computation of MSAs, among which the progressive align-
ment heuristic [ 1 ] is most widely used. However, the MSAs
produced by these heuristics are generally suboptimal and may
not meet the requirements of biologists. To further improve align-
ment accuracy, many modern progressive alignment-based MSA
algorithms have fused other techniques into progressive alignment,
such as introducing iterative refinement or consistency-based
schemes.
Search WWH ::




Custom Search