Biology Reference
In-Depth Information
Once the partition function is constructed, the posterior
probability of
x
i
aligned to
y
j
can be computed as
Z
i
1
;
j
1
Z
0
i
þ
1
;
j
þ
1
Z
¼
ðÞ
e
sx
i
;y
j
=
T
Px
i
y
j
;
(9)
where
Z
0
M
i;j
is the partition function of alignments of subsequences
x
i
...
m
and
y
j
...
n
beginning with
x
i
paired with
y
j
and
m
and
n
are
lengths of
x
and
y
respectively. This can be computed using standard
backward recursion formulas [
3
]. In the above equation
Z
i
1
;j
1
Z
=
and
Z
0M
i
represent the probabilities of feasible suboptimal
alignments (as determined by the
T
parameter) of
x
1
...
i 1
and
y
1
...
j 1
,and
x
i
+
1
...
m
and
y
j
+
1
...
n
respectively, where
m
and
n
are
lengths of
x
and
y
respectively. Thus, the equation weighs alignments
according to their partition function probabilities and estimates
Px
i
1
Z
=
þ
1
;
j
þ
y
j
as the sum of probabilities of all alignments where
x
i
is
paired with
y
j
.
, we define the
Given the posterior probability matrix
Px
i
y
j
2.4 Maximal
Expected Accuracy
Alignment
expected accuracy of the alignment of
x
and
y
as
1
X
:
a
j
Px
i
y
j
2
x
;
y
(10)
min
f
j
x
jj
y
j
g
x
i
y
j
2
a
The maximum expected accuracy alignment score is computed
by dynamic programming using the following recurrence described
in Durbin [
3
].
for
i
¼
j
x
j
1to
for
j
¼
1to
j
y
j
<
:
=
;
Ai
ð
1
;
j
1
Þ þ
Px
i
y
j
Ai
ðÞ¼
;
j
max
Ai
ð
1
;
j
Þ
:
(11)
Ai
ð
;
j
Þ
1
The first row and column of
A
are set to 0. The alignment score
is given by
A
denote the lengths of
sequences x and y. The actual alignment of
x
and
y
can be recovered
by keeping track of which cell the maximum value is obtained from
for each entry of
A
[
3
].
Both Probcons and Probalign first estimate posterior probabil-
ities for amino acid residues for all pairs of protein sequences in the
input. Probcons introduced a number of new approaches for con-
structing a multiple alignment with posterior probabilities for all
pairs of sequences. It first performs a probabilistic consistency
transformation to improve posterior probabilities with the aid of a
third sequence [
12
]. It then adapts three standard approaches in
multiple sequence alignment, namely construction of a guide-tree,
ð
j
x
j; j
y
j
Þ
where
j
x
j
and
j
y
j