PicXAA: A Probabilistic Scheme for Finding the Maximum Expected Accuracy Alignment of Multiple Biological Sequences - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

accurate alignment results despite the low computational overhead.

However, it does not work as well when applied to a set of

divergent sequences that share only local similarities. Typically,

the progressive scheme tends to propagate early-stage errors

throughout the entire alignment process, which can be problematic

when we need to align a set of sequences that prominently share

local similarities but also possess many differences across sequence

regions. In such a case, it may be difficult to build up the MSA

through progressive alignment, as it may dilute local similarities and

propagate errors that arise in divergent sequence regions. Until

now, several techniques have been developed to address the short-

coming of progressive alignment and alleviate these undesirable

effects [ 17 - 20 ].

Recently, a novel alignment algorithm called PicXAA [ 20 ] has

been proposed to address this problem by adopting a computation-

ally efficient non-progressive scheme. Based on the maximum

expected accuracy (MEA) principle, PicXAA aims to find the opti-

mal alignment that maximizes the expected number of correctly

aligned symbols (i.e., amino acids or nucleotides). Towards this

goal, PicXAA first computes the posterior pairwise symbol align-

ment probability for all pairs of symbol locations for every sequence

pair. Next, it updates the estimated probabilities through an

improved probabilistic consistency transformation, which aims to

refine the symbol alignment probabilities of a given sequence pair

by incorporating the information from other sequences. Using an

efficient graph-based technique, PicXAA greedily builds up the

alignment based on the updated probabilities, starting from confi-

dently alignable regions with high local similarities. Once the initial

alignment is constructed, PicXAA goes through an iterative refine-

ment process to further improve the alignment quality in divergent

sequence regions that cannot be confidently aligned. In summary,

PicXAA can accurately predict the global alignment of multiple

biological sequences, in which local homologies are effectively

captured. Experimental results confirm that PicXAA consistently

yields accurate alignment results in various benchmarks, where the

improvements are especially significant on reference sets that con-

sist of sequences with only local similarities [ 20 ].

2 Methods

PicXAA [ 20 ] aims to find the multiple sequence alignment with the

maximum expected accuracy, i.e., the maximum expected number

of correctly aligned residue pairs. Through a greedy approach,

PicXAA probabilistically builds up the MSA, by starting from

high similarity regions and proceeding towards more divergent

regions that bear less similarity. In this way, PicXAA effectively

avoids the error-propagation problem that many of the current

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home