Biology Reference
In-Depth Information
Suppose we are going to align a protein sequence group S, in
which protein sequences
X
and
Y
are considered as two represen-
tatives. The
sequences
of
X
and
Y
are
denoted
as
X
¼ð
x
1
x
2
x
n
1
Þ
,
Y
¼ð
y
1
y
2
y
n
2
Þ
, where
x
1
x
2
x
n
1
and
;
; ...;
;
; ...;
;
; ...;
y
1
;
y
n
2
are lists of the residues in
X
and
Y
, respectively.
n
1
and
n
2
are the length of sequence
X
and
Y
, respectively.
x
i
is the
i
-th amino acid in sequence
X
, and
y
j
is the
j
-th amino
acid in sequence
Y
. We let aln represent a global alignment
between
X
and
Y
, ALN the set of all the possible global align-
ments of
X
and
Y
, and aln
2
y
2
; ...;
ALN the true pairwise alignment
of
X
and
Y
. Following MSAProbs, the posterior probability
that the
i
-th residue in
X
(
x
i
) is aligned to the
j
-th residue (
y
j
)
in
Y
in aln
is defined as:
Þ¼
X
aln
aln
j
p
ð
x
i
y
j
2
X
Y
P
ð
aln
j
X
Y
Þ
I
f
x
i
y
j
2
aln
g
;
;
(1)
2
ALN
ð
1
x
i
n
1
;
1
y
j
n
2
Þ
(
1
if
ð
x
i
y
j
2
aln
Þ
true
;
I
f
x
i
y
j
2
aln
g¼
0
otherwise
;
is the posterior probability that aln is the true
alignment aln
. Thus, the
n
1
P
ð
aln
j
X
Y
Þ
;
n
2
posterior probability matrix
P
XY
is a matrix including all the values
p
aln
j
ð
x
i
y
j
2
X
Y
Þ
;
(
p
n
2
. The calculation
process of the pairwise posterior probability matrix is described as
below.
The pairwise posterior probability matrix in MSACompro is
combination of two types of pairwise posterior probability matrices
(
P
XY
and
P
XY
) calculated by two different methods (a pair hidden
Markov model and a partition function) respectively. The first kind
of pairwise probability matrix
P
XY
is calculated by a partition
function (
F
) of alignments based on dynamic programming.
F
ð
x
i
y
j
Þ
for short) for 1
x
i
n
1
1
y
j
;
represents the probability of all partial global alignments of
X
and
Y
ending at position (
i
,
j
). Before discussing the calculation
of
F
ð
i
j
Þ
;
, the
probability of all partial global alignments with
x
i
aligned to
y
j
;
F
Y
ð
i
j
Þ
, three other probabilities are introduced:
F
M
ð
i
j
Þ
;
;
, the probability of all partial global alignments with
y
j
aligned to a gap;
F
X
ð
i
j
Þ
;
, the probability of all partial global align-
ments with
x
i
aligned to a gap. Accordingly,
F
ð
i
j
Þ
;
ð
i
;
j
Þ
can be calcu-
lated recursively as follows:
e
W
1
β
sðx
i
;
y
j
ÞþW
2
SS
ð
ss
ðx
i
Þ
;
ss
ðy
j
ÞÞþW
3
SA
ð
sa
ðx
i
Þ
;
sa
ðy
j
ÞÞ
F
M
ð
i
j
Þ¼
F
ð
i
1
j
1
Þ
;
;
e
β
gap
e
β
ext
F
Y
ð
i
j
Þ¼
F
M
ð
i
j
1
Þ
þ
F
Y
ð
i
j
1
Þ
;
;
;
e
β
gap
e
β
ext
F
X
ð
i
j
Þ¼
F
M
ð
i
1
j
Þ
þ
F
X
ð
i
1
j
Þ
;
;
;
F
ð
i
j
Þ¼
F
M
ð
i
j
Þþ
F
Y
ð
i
j
Þþ
F
X
ð
i
j
Þ
;
;
;
;
(2)