Heuristic Alignment Methods - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

H m;n , and

the associated alignment is obtained by a trace back procedure.

Today, the affine gap penalty function is adopted in nearly all

pairwise and multiple sequence programs. However, a slightly

more general “piecewise linear gap penalty function” [ 11 ] may be

preferred when existence of long gaps is expected, e.g., when

genomic sequences are to be aligned. The so called “double affine

gap penalty” corresponds to the simplest case of the number of

pieces being two, for which the computational cost is only margin-

ally (20 ~ 30 %) higher than that with a usual affine gap penalty

function.

Although the above mentioned algorithms generally produce

only one best alignment, optimal alignments are often degenerated,

i.e., several alternative alignments have the same optimal score

[ 11 ]. If we extend our attention to only slightly less optimal solu-

tions, many optimal and near-optimal alignments may be found

[ 12 ]. Instead of enumerating all these optimal/near-optimal align-

ments, however, we can obtain more informative statistical features

associated with the all possible alignments of the two sequences by

means of the so-called probabilistic alignment methods [ 13 , 14 ].

While these initial studies attempted to mimic real evolutionary

processes, Miyazawa [ 15 ] reached a related idea inspired by statisti-

cal physics; he considered that the optimal alignment mentioned

above corresponds to the state of minimal energy, or minimal free

energy at 0 K, whereas more realistic views might be obtained by

minimizing the free energy at an ambient temperature, T

The optimal alignment score is given by H a; ðÞ¼

0K.

To do so, the partial alignment scores shown in Eq. 1 are replaced

by “partition functions” Z i;j ( X

H , E ,or F ), which follow a set of

recurrent relations analogous to Eq. 1 :

ð =T

e Sa i ;b j

Z i;j ¼

Z i 1 ;j 1 z i;j

;

z i;j

e v=T

Z i;j ¼

Z i 1 ;j

Z i 1 ;j e u=T

Z i 1 ;j

e v=T

Z i;j ¼

Z i;j 1

Z i;j 1 e u=T

Z i;j 1

Z i;j þ

Z i;j

Z i;j ¼

(2)

the probability of a i and b j being aligned, one must calculate

another recurrence for the “backward partition function” Z i;j

starting from the back end. The posterior probability p i , j is then

obtained by:

To obtain interesting statistical features, e.g., p i , j ¼

P ( a i ~ b i )

Z i;j Z i;j z 1

p i;j

Z a; ð ;

(3)

;

where the factor z 1

is introduced to compensate for the duplicate

;

in Z i;j Z i;j , and Z a; ðÞ¼

multiplication of z i , j

Z m;n is the total

partition function. Another formulation of probabilistic alignment

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home