Biology Reference
In-Depth Information
Probcons computes these probabilities using a Hidden Markov
Model (HMM) for pairwise sequence alignment. The HMM
parameters are learned using unsupervised learning on the
BAliBASE 2.0 benchmark.
Probalign [
13
] on the other hand estimates amino acid poste-
rior probabilities from the partition function of alignments as
described by Miyazawa [
14
]. It then proceeds to compute the
maximal expected accuracy multiple sequence alignment by follow-
ing the strategy of Probcons. We first describe both methods of
computing posterior probabilities in detail below. We then describe
the Probcons alignment algorithm that makes use of the probabil-
ities to output a final alignment. Probalign follows the same
approach.
2 Methods
The expected accuracy of an alignment is based upon the posterior
probabilities of aligning residues in two sequences. Consider
sequences
x
and
y
and let
a
* be their true alignment. Following
the description in Do [
12
] the posterior probability of residue
x
i
aligned to
y
j
in
a
* is defined as
2.1 Posterior
Probabilities for
Expected Accuracy
Sequence Alignment
¼
X
a
;
a
j
Px
i
y
j
2
x
;
y
Pa
ð
j
x
;
y
Þ
1
x
i
y
j
2
a
(1)
2
A
where
A
is the set of all alignments of
x
and
y
and
1
(
expr
) is the
indicator function which returns 1 if the expression
expr
evaluates
to true and 0 otherwise.
Pa
represents the probability that
alignment
a
is the true alignment
a
*. This can easily be calculated
using a pairwise HMM if all the parameters are known (described
below). From here on we represent the posterior probability as
Px
i
ð
j
x
;
y
Þ
with the understanding that it represents the probability
of
x
i
aligned to
y
j
in the true alignment
a
*.
According to Eq.
1
as long as we have an ensemble of align-
ments
A
with their probabilities
Pa
y
j
ð
j
x
;
y
Þ
we can compute the
by summing up the probabilities
of alignments where
x
i
is paired with
y
j
. Probcons uses hidden
Markov models while Probalign uses the partition function of
sequence alignments to generate the ensemble.
posterior probability
Px
i
y
j
Probcons uses a basic sequence alignment hidden Markov model
(HMM) shown in Fig.
1
.
The emission probabilities for the hidden states
M
,
I
x
, and
I
y
are
given by
px
i
2.2 Posterior
Probabilities by Hidden
Markov Models
y
j
,
q
(
x
i
), and
q
(
y
j
) where
x
i
is the
i
th residue
of sequence
x
and
y
j
defined correspondingly. The terms
;
δ
and
ε
represent transition probabilities for gap open and gap extensions.
The probability of a sequence alignment under this model is well-
defined and the one with the highest probability can be found with