Biology Reference
In-Depth Information
Fig. 1 Hidden Markov model for pairwise sequence alignment
the Viterbi algorithm [ 3 ]. The posterior probabilities can then be
obtained by
fi
ðÞ
;
j
bi
ðÞ
;
j
a j
Px i
y j 2
x
;
y
ðÞ :
(2)
Px
;
y
In the above equation f ( i , j ) is the sum of all probabilities of all
alignments of x 1 ... i and y 1 ... j where x 1 ... i are the first i characters of
sequence x and y 1 ... j is defined the same way. The term b ( i , j ) is the
sum of all probabilities of all alignments of x i + 1 ... m and y j + 1 ... n
where m and n are the lengths of sequences x and y respectively.
And finally P ( x , y ) is the sum of the probabilities of all alignments
of x and y under the model. These can be obtained by modifying
the Viterbi algorithm to add instead of taking the max as shown in
Durbin [ 3 ].
Amino acid scoring matrices that are normally used for sequence
alignment are represented as log-odds scoring matrices as defined
by Dayhoff [ 15 ]. The commonly used sum-of-pairs score of an
alignment [ 3 ] is defined as the sum of residue-residue pairs and
residue-gap pairs under an affine penalty scheme.
2.3 Posterior
Probabilities by
Partition Function
T X
i
þ
S
ð
a
Þ¼
ln M ij
f i f j
ð
Þ:
gap penalties
(3)
ðÞ2
;
j
a
Here T is a constant and set according to the scoring matrix, M ij is
the mutation probability of residue i changing to j and f i and f j are
background frequencies of residues i and j . In fact, it can be shown
that any scoring matrix corresponds to a log odds matrix [ 16 , 17 ].
Miyazawa [ 14 ] proposed that the probability of alignment P ( a )of
sequences x and y can be defined as
e SðaÞ T
=
P
ð
a
Þ/
;
(4)
Search WWH ::




Custom Search