Biology Reference
In-Depth Information
Chapter 18
MSACompro: Improving Multiple Protein Sequence
Alignment by Predicted Structural Features
Xin Deng and Jianlin Cheng
Abstract
Multiple Sequence Alignment (MSA) is an essential tool in protein structure modeling, gene and protein
function prediction, DNA motif recognition, phylogenetic analysis, and many other bioinformatics tasks.
Therefore, improving the accuracy of multiple sequence alignment is an important long-term objective in
bioinformatics. We designed and developed a new method MSACompro to incorporate predicted second-
ary structure, relative solvent accessibility, and residue-residue contact information into the currently most
accurate posterior probability-based MSA methods to improve the accuracy of multiple sequence align-
ments. Different from the multiple sequence alignment methods that use the tertiary structure information
of some sequences, our method uses the structural information purely predicted from sequences. In this
chapter, we first introduce some background and related techniques in the field of multiple sequence
alignment. Then, we describe the detailed algorithm of MSACompro. Finally, we show that integrating
predicted protein structural information improved the multiple sequence alignment accuracy.
Key words Multiple sequence alignment, Bioinformatics, Secondary structure, Solvent accessibility,
Residue-residue contact information, Posterior probability-based
1
Introduction
Multiple sequence alignment methods are central to many chal-
lenging bioinformatics problems, such as protein function predic-
tion, protein homology identification, protein structure prediction,
protein interaction study, mutagenesis analysis, and phylogenetic
tree construction. Since a few decades ago, a number of methods
and tools have been developed for multiple sequence alignment,
which facilitated the development of the bioinformatics field.
Well-established techniques, such as iterative alignment [ 1 ],
progressive alignment [ 2 ], alignment based on profile hidden Mar-
kov models [ 3 ], and posterior alignment probability transformation
[ 4 , 5 ] have been widely adapted in state of art multiple sequence
alignment methods to enhance alignment accuracy. Besides, known
3D structure information is also used by some alignment methods,
Search WWH ::




Custom Search