Multiple Sequence Alignment Using Probcons and Probalign - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

Probcons computes these probabilities using a Hidden Markov

Model (HMM) for pairwise sequence alignment. The HMM

parameters are learned using unsupervised learning on the

BAliBASE 2.0 benchmark.

Probalign [ 13 ] on the other hand estimates amino acid poste-

rior probabilities from the partition function of alignments as

described by Miyazawa [ 14 ]. It then proceeds to compute the

maximal expected accuracy multiple sequence alignment by follow-

ing the strategy of Probcons. We first describe both methods of

computing posterior probabilities in detail below. We then describe

the Probcons alignment algorithm that makes use of the probabil-

ities to output a final alignment. Probalign follows the same

approach.

2 Methods

The expected accuracy of an alignment is based upon the posterior

probabilities of aligning residues in two sequences. Consider

sequences x and y and let a * be their true alignment. Following

the description in Do [ 12 ] the posterior probability of residue x i

aligned to y j in a * is defined as

2.1 Posterior

Probabilities for

Expected Accuracy

Sequence Alignment

¼ X

;

a j

Px i

y j 2

;

1 x i

y j 2

(1)

where A is the set of all alignments of x and y and 1 ( expr ) is the

indicator function which returns 1 if the expression expr evaluates

to true and 0 otherwise. Pa

represents the probability that

alignment a is the true alignment a *. This can easily be calculated

using a pairwise HMM if all the parameters are known (described

below). From here on we represent the posterior probability as

Px i

;

with the understanding that it represents the probability

of x i aligned to y j in the true alignment a *.

According to Eq. 1 as long as we have an ensemble of align-

ments A with their probabilities Pa

y j

;

we can compute the

by summing up the probabilities

of alignments where x i is paired with y j . Probcons uses hidden

Markov models while Probalign uses the partition function of

sequence alignments to generate the ensemble.

posterior probability Px i

y j

Probcons uses a basic sequence alignment hidden Markov model

(HMM) shown in Fig. 1 .

The emission probabilities for the hidden states M , I x , and I y are

given by px i

2.2 Posterior

Probabilities by Hidden

Markov Models

y j , q ( x i ), and q ( y j ) where x i is the i th residue

of sequence x and y j defined correspondingly. The terms

;

and

represent transition probabilities for gap open and gap extensions.

The probability of a sequence alignment under this model is well-

defined and the one with the highest probability can be found with

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home