Biology Reference
In-Depth Information
Fig. 1 Hidden Markov model for pairwise sequence alignment
the Viterbi algorithm [
3
]. The posterior probabilities can then be
obtained by
fi
ðÞ
;
j
bi
ðÞ
;
j
a
j
Px
i
y
j
2
x
;
y
ðÞ
:
(2)
Px
;
y
In the above equation
f
(
i
,
j
) is the sum of all probabilities of all
alignments of
x
1
...
i
and
y
1
...
j
where x
1
...
i
are the first
i
characters of
sequence
x
and
y
1
...
j
is defined the same way. The term
b
(
i
,
j
) is the
sum of all probabilities of all alignments of
x
i
+
1
...
m
and
y
j
+
1
...
n
where
m
and
n
are the lengths of sequences
x
and
y
respectively.
And finally
P
(
x
,
y
) is the sum of the probabilities of all alignments
of
x
and
y
under the model. These can be obtained by modifying
the Viterbi algorithm to add instead of taking the max as shown in
Durbin [
3
].
Amino acid scoring matrices that are normally used for sequence
alignment are represented as log-odds scoring matrices as defined
by Dayhoff [
15
]. The commonly used sum-of-pairs score of an
alignment [
3
] is defined as the sum of residue-residue pairs and
residue-gap pairs under an affine penalty scheme.
2.3 Posterior
Probabilities by
Partition Function
T
X
i
þ
S
ð
a
Þ¼
ln
M
ij
f
i
f
j
ð
Þ:
gap penalties
(3)
ðÞ2
;
j
a
Here
T
is a constant and set according to the scoring matrix,
M
ij
is
the mutation probability of residue
i
changing to
j
and
f
i
and
f
j
are
background frequencies of residues
i
and
j
. In fact, it can be shown
that any scoring matrix corresponds to a log odds matrix [
16
,
17
].
Miyazawa [
14
] proposed that the probability of alignment
P
(
a
)of
sequences
x
and
y
can be defined as
e
SðaÞ T
=
P
ð
a
Þ/
;
(4)