Biology Reference
In-Depth Information
by incorporating the information from other sequences in the set S .
The idea of PCT has been originally proposed in [ 10 ] and it has
been widely adopted afterwards by many alignment algorithms.
It has been shown that such transformation can ultimately lead to
a more consistent and accurate MSA. The improved PCT first
proposed and adopted in PicXAA [ 20 ] improves the original PCT
by considering the relative significance of each intermediate
sequence z 2 S fx ; yg
while transforming the pairwise alignment
probabilities, originally estimated using pair-HMMs, partition
function method, or structural pair-HMMs ( see ref. 20 for details).
The improved transformation is defined as:
P 0 x i
a jS
y j
2
P x}z
z2S
a jx ; z
a jz ; y
Px i
ð
z k 2
Þ
Pz k
y j 2
ð
Þ
P z}y
ð
Þ
z2S
P x}z
ð
Þ
P z}y
ð
Þ
where P x}z
ð
Þ
is the probability that sequences
x
and
z
are
homologous to each other. This probability P x}z
is estimated
by computing the average residue alignment probability in the
optimal pairwise alignment between x and z . This transformation
can be applied for more than one round of iterations ( see Note 8 ).
ð
Þ
To find the alignment that maximizes the number of correctly
aligned residues and effectively captures the local similarities between
the given sequences, PicXAA constructs the MSA by adding one
aligned residue pair at a time, starting from the most confidently
alignable regions (i.e., residue pairs with high alignment probabil-
ities) and progressing towards less confident regions (i.e., residue
pairs with relatively low alignment probabilities) ( see Note 5 ).
During this process, PicXAA preserves the internal consistency
of the alignment by avoiding any conflicts between the current
alignment and the potential residue pair to be added to the align-
ment. In order to verify this compatibility in an efficient manner,
PicXAA adopts a graph-based strategy for building up the align-
ment. In this approach, the MSA is represented as a directed acyclic
graph
2.2 Construction of
the Alignment Graph
correspond to the columns in the
alignment and the directed edges between nodes reflect the relative
order of the corresponding columns in the final sequence align-
ment. To construct the alignment graph
G
, where the nodes in
G
, PicXAA first sorts all
possible residue pairs for all pairs of sequences in S according to the
consistency transformed posterior alignment probabilities, in a
descending order, to get an ordered set P . Starting from the most
probable residue pair, we successively add residue pairs in P to the
alignment graph
G
one pair at a time, provided that the pair being
added to the alignment is compatible with the current alignment
graph. This compatibility can be easily verified by finding out
whether the graph remains acyclic after adding the new residue
G
Search WWH ::




Custom Search