Biology Reference
In-Depth Information
Probcons computes these probabilities using a Hidden Markov
Model (HMM) for pairwise sequence alignment. The HMM
parameters are learned using unsupervised learning on the
BAliBASE 2.0 benchmark.
Probalign [ 13 ] on the other hand estimates amino acid poste-
rior probabilities from the partition function of alignments as
described by Miyazawa [ 14 ]. It then proceeds to compute the
maximal expected accuracy multiple sequence alignment by follow-
ing the strategy of Probcons. We first describe both methods of
computing posterior probabilities in detail below. We then describe
the Probcons alignment algorithm that makes use of the probabil-
ities to output a final alignment. Probalign follows the same
approach.
2 Methods
The expected accuracy of an alignment is based upon the posterior
probabilities of aligning residues in two sequences. Consider
sequences x and y and let a * be their true alignment. Following
the description in Do [ 12 ] the posterior probability of residue x i
aligned to y j in a * is defined as
2.1 Posterior
Probabilities for
Expected Accuracy
Sequence Alignment
¼ X
a
;
a j
Px i
y j 2
x
;
y
Pa
ð
j
x
;
y
Þ
1 x i
y j 2
a
(1)
2
A
where A is the set of all alignments of x and y and 1 ( expr ) is the
indicator function which returns 1 if the expression expr evaluates
to true and 0 otherwise. Pa
represents the probability that
alignment a is the true alignment a *. This can easily be calculated
using a pairwise HMM if all the parameters are known (described
below). From here on we represent the posterior probability as
Px i
ð
j
x
;
y
Þ
with the understanding that it represents the probability
of x i aligned to y j in the true alignment a *.
According to Eq. 1 as long as we have an ensemble of align-
ments A with their probabilities Pa
y j
ð
j
x
;
y
Þ
we can compute the
by summing up the probabilities
of alignments where x i is paired with y j . Probcons uses hidden
Markov models while Probalign uses the partition function of
sequence alignments to generate the ensemble.
posterior probability Px i
y j
Probcons uses a basic sequence alignment hidden Markov model
(HMM) shown in Fig. 1 .
The emission probabilities for the hidden states M , I x , and I y are
given by px i
2.2 Posterior
Probabilities by Hidden
Markov Models
y j , q ( x i ), and q ( y j ) where x i is the i th residue
of sequence x and y j defined correspondingly. The terms
;
δ
and
ε
represent transition probabilities for gap open and gap extensions.
The probability of a sequence alignment under this model is well-
defined and the one with the highest probability can be found with
Search WWH ::




Custom Search