Multiple Sequence Alignment Using Probcons and Probalign - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

Fig. 1 Hidden Markov model for pairwise sequence alignment

the Viterbi algorithm [ 3 ]. The posterior probabilities can then be

obtained by

fi

ðÞ

;

j

bi

ðÞ

;

j

a j

Px i

y j 2

x

;

y

ðÞ :

(2)

Px

;

y

In the above equation f ( i , j ) is the sum of all probabilities of all

alignments of x 1 ... i and y 1 ... j where x 1 ... i are the first i characters of

sequence x and y 1 ... j is defined the same way. The term b ( i , j ) is the

sum of all probabilities of all alignments of x i + 1 ... m and y j + 1 ... n

where m and n are the lengths of sequences x and y respectively.

And finally P ( x , y ) is the sum of the probabilities of all alignments

of x and y under the model. These can be obtained by modifying

the Viterbi algorithm to add instead of taking the max as shown in

Durbin [ 3 ].

Amino acid scoring matrices that are normally used for sequence

alignment are represented as log-odds scoring matrices as defined

by Dayhoff [ 15 ]. The commonly used sum-of-pairs score of an

alignment [ 3 ] is defined as the sum of residue-residue pairs and

residue-gap pairs under an affine penalty scheme.

2.3 Posterior

Probabilities by

Partition Function

T X

i

þ

S

ð

a

Þ¼

ln M ij

f i f j

ð

Þ:

gap penalties

(3)

ðÞ2

;

j

a

Here T is a constant and set according to the scoring matrix, M ij is

the mutation probability of residue i changing to j and f i and f j are

background frequencies of residues i and j . In fact, it can be shown

that any scoring matrix corresponds to a log odds matrix [ 16 , 17 ].

Miyazawa [ 14 ] proposed that the probability of alignment P ( a )of

sequences x and y can be defined as

e SðaÞ T

=

P

ð

a

Þ/

;

(4)

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home