Biology Reference
In-Depth Information
H m;n , and
the associated alignment is obtained by a trace back procedure.
Today, the affine gap penalty function is adopted in nearly all
pairwise and multiple sequence programs. However, a slightly
more general “piecewise linear gap penalty function” [ 11 ] may be
preferred when existence of long gaps is expected, e.g., when
genomic sequences are to be aligned. The so called “double affine
gap penalty” corresponds to the simplest case of the number of
pieces being two, for which the computational cost is only margin-
ally (20 ~ 30 %) higher than that with a usual affine gap penalty
function.
Although the above mentioned algorithms generally produce
only one best alignment, optimal alignments are often degenerated,
i.e., several alternative alignments have the same optimal score
[ 11 ]. If we extend our attention to only slightly less optimal solu-
tions, many optimal and near-optimal alignments may be found
[ 12 ]. Instead of enumerating all these optimal/near-optimal align-
ments, however, we can obtain more informative statistical features
associated with the all possible alignments of the two sequences by
means of the so-called probabilistic alignment methods [ 13 , 14 ].
While these initial studies attempted to mimic real evolutionary
processes, Miyazawa [ 15 ] reached a related idea inspired by statisti-
cal physics; he considered that the optimal alignment mentioned
above corresponds to the state of minimal energy, or minimal free
energy at 0 K, whereas more realistic views might be obtained by
minimizing the free energy at an ambient temperature, T
The optimal alignment score is given by H a; ðÞ¼
0K.
To do so, the partial alignment scores shown in Eq. 1 are replaced
by “partition functions” Z i;j ( X
>
H , E ,or F ), which follow a set of
recurrent relations analogous to Eq. 1 :
¼
ð =T
e Sa i ;b j
Z i;j ¼
Z i 1 ;j 1 z i;j
;
z i;j
¼
e v=T
Z i;j ¼
Z i 1 ;j
Z i 1 ;j e u=T
Z i 1 ;j
þ
e v=T
Z i;j ¼
Z i;j 1
Z i;j 1 e u=T
Z i;j 1
þ
Z i;j þ
Z i;j þ
Z i;j
Z i;j ¼
(2)
¼
the probability of a i and b j being aligned, one must calculate
another recurrence for the “backward partition function” Z i;j
starting from the back end. The posterior probability p i , j is then
obtained by:
To obtain interesting statistical features, e.g., p i , j ¼
P ( a i ~ b i )
Z i;j Z i;j z 1
p i;j
¼
=
Z a; ð ;
(3)
i
;
j
where the factor z 1
i
is introduced to compensate for the duplicate
;
j
in Z i;j Z i;j , and Z a; ðÞ¼
multiplication of z i , j
Z m;n is the total
partition function. Another formulation of probabilistic alignment
Search WWH ::




Custom Search