Multiple Protein Sequence Alignment with MSAProbs - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

Chapter 14

Multiple Protein Sequence Alignment with MSAProbs

Yongchao Liu and Bertil Schmidt

Abstract

Multiple sequence alignment (MSA) generally constitutes the foundation of many bioinformatics studies

involving functional, structural, and evolutionary relationship analysis between sequences. As a result of the

exponential computational complexity of the exact approach to producing optimal multiple alignments,

the majority of state-of-the-art MSA algorithms are designed based on the progressive alignment heuristic.

In this chapter, we outline MSAProbs, a parallelized MSA algorithm for protein sequences based on

progressive alignment. To achieve high alignment accuracy, this algorithm employs a hybrid combination

of a pair hidden Markov model and a partition function to calculate posterior probabilities. Furthermore,

we provide some practical advice on the usage of the algorithm.

Key words Multiple sequence alignment, Progressive alignment, Hidden Markov models, Partition

function, Consistency-based scheme

1

Introduction

Multiple sequence alignment (MSA) is fundamental to many bio-

informatics analysis studies that involve analyzing functional, struc-

tural, and evolutionary relationships between sequences. The exact

approach to producing optimal MSAs relies on exhaustive dynamic

programming. However, this approach has an exponential compu-

tational complexity and thus prohibits its use for large-scale data

analysis. Therefore, many heuristics have been proposed to acceler-

ate the computation of MSAs, among which the progressive align-

ment heuristic [ 1 ] is most widely used. However, the MSAs

produced by these heuristics are generally suboptimal and may

not meet the requirements of biologists. To further improve align-

ment accuracy, many modern progressive alignment-based MSA

algorithms have fused other techniques into progressive alignment,

such as introducing iterative refinement or consistency-based

schemes.

Search WWH ::

Custom Search

Home