Biology Reference
In-Depth Information
Chapter 13
PicXAA: A Probabilistic Scheme for Finding
the Maximum Expected Accuracy Alignment
of Multiple Biological Sequences
Sayed Mohammad Ebrahim Sahraeian and Byung-Jun Yoon
Abstract
PicXAA is a probabilistic nonprogressive alignment algorithm that finds protein (or DNA) multiple
sequence alignments with maximum expected accuracy. PicXAA greedily builds up the alignment from
sequence regions with high local similarity, thereby yielding an accurate global alignment that effectively
captures the local similarities across sequences. PicXAA constantly yields accurate alignment results on a
wide range of reference sets that have different characteristics, with especially remarkable improvements
over other leading algorithms on sequence sets with high local similarities. In this chapter, we describe the
overall alignment strategy used in PicXAA and discuss several
important considerations for effective
deployment of the algorithm.
Key words Multiple sequence alignment, Nonprogressive alignment, Maximum expected accuracy
(MEA), Probabilistic consistency transformation, PicXAA
1
Introduction
Multiple sequence alignment (MSA) is an indispensable tool in
comparative studies of biological sequences, and it plays a promi-
nent role in many applications such as phylogenetic analysis, struc-
ture prediction, function prediction, motif discovery, and modeling
sequence homology [ 1 - 7 ]. The mathematically optimal MSA can
be found using dynamic programming. However, the dynamic
programming approach has a high computational cost that renders
it impractical for aligning more than a few sequences. For this
reason, the progressive alignment scheme—which successively
aligns pairs of sequences (or sequence profiles) along a phylogenetic
tree of the given sequences—has gained popularity as a practical
alternative [ 8 - 16 ]. In fact, the progressive alignment technique is
surprisingly effective for closely related sequences and it yields
Search WWH ::




Custom Search