Biology Reference
In-Depth Information
accurate alignment results despite the low computational overhead.
However, it does not work as well when applied to a set of
divergent sequences that share only local similarities. Typically,
the progressive scheme tends to propagate early-stage errors
throughout the entire alignment process, which can be problematic
when we need to align a set of sequences that prominently share
local similarities but also possess many differences across sequence
regions. In such a case, it may be difficult to build up the MSA
through progressive alignment, as it may dilute local similarities and
propagate errors that arise in divergent sequence regions. Until
now, several techniques have been developed to address the short-
coming of progressive alignment and alleviate these undesirable
effects [ 17 - 20 ].
Recently, a novel alignment algorithm called PicXAA [ 20 ] has
been proposed to address this problem by adopting a computation-
ally efficient non-progressive scheme. Based on the maximum
expected accuracy (MEA) principle, PicXAA aims to find the opti-
mal alignment that maximizes the expected number of correctly
aligned symbols (i.e., amino acids or nucleotides). Towards this
goal, PicXAA first computes the posterior pairwise symbol align-
ment probability for all pairs of symbol locations for every sequence
pair. Next, it updates the estimated probabilities through an
improved probabilistic consistency transformation, which aims to
refine the symbol alignment probabilities of a given sequence pair
by incorporating the information from other sequences. Using an
efficient graph-based technique, PicXAA greedily builds up the
alignment based on the updated probabilities, starting from confi-
dently alignable regions with high local similarities. Once the initial
alignment is constructed, PicXAA goes through an iterative refine-
ment process to further improve the alignment quality in divergent
sequence regions that cannot be confidently aligned. In summary,
PicXAA can accurately predict the global alignment of multiple
biological sequences, in which local homologies are effectively
captured. Experimental results confirm that PicXAA consistently
yields accurate alignment results in various benchmarks, where the
improvements are especially significant on reference sets that con-
sist of sequences with only local similarities [ 20 ].
2 Methods
PicXAA [ 20 ] aims to find the multiple sequence alignment with the
maximum expected accuracy, i.e., the maximum expected number
of correctly aligned residue pairs. Through a greedy approach,
PicXAA probabilistically builds up the MSA, by starting from
high similarity regions and proceeding towards more divergent
regions that bear less similarity. In this way, PicXAA effectively
avoids the error-propagation problem that many of the current
Search WWH ::




Custom Search