Information Technology Reference
In-Depth Information
log
X
d ik ;
d jl Þ
pd jl j
p
ð
pd ik j
h i ; k ; j ; l ¼
c i ;
c k ;
m ik
c j ;
c l ;
m jl
ð
2
:
6
Þ
d ik Þ
d jl Þ
P ref ð
P ref ð
d ik ; d jl
is the probability of two nodes i and k in T interacting at
where pd ik j c i ; c k ; m ik
is the probability of two nodes j and l in S interacting
distance d ik ;
pd jl j
c j ;
c l ;
m jl
is the probability of one distance d ik
at distance d jl ;
pd ik ;
d jl
being aligned to
in reference alignments; and P ref d ik P ref d jl
is the
another distance d jl
background probability of observing d ik
(d jl ) in a protein structure. Meanwhile xi i
and x k are position-speci
c features centered at the ith and kth residues, respec-
tively, and m ik represents the mutual information between the ith and kth columns in
the multiple sequence alignment.
Compared to contact-based potentials, here we use interaction at a given distance
to obtain a higher-resolution description of the residue interaction pattern, as shown
in Fig. 2.5 . Therefore, this edge alignment potential is more informative and thus,
may lead to better alignment accuracy and homology detection rate.
Now we explain how to calculate each term in Eq. ( 2.6 ). P ref d ik P ref d jl
can be calculated by simple counting on a set of non-redundant protein structures,
e.g., PDB25. Similar to P ref d ik ;
can also be calculated by simple
counting on a set of non-redundant reference alignments. That is, we randomly
choose a set of protein pairs such that two proteins in each pair are similar at least at
the fold level. Then we generate their reference alignment (i.e., structure align-
ments) using a structure alignment tool DeepAlign [ 15 ] and
Pd ik ;
d jl
finally do simple
counting to estimate pd ik ;
d jl
In order to use simple counting, we discretize inter-
residue distance into 12 intervals: <4, 4
:
Å
-
5, 5
-
6,
,14
-
15, and >15
.
using a
As explained in the previous section, we predict pd ik j
c i ;
c k ;
m ik
probabilistic neural network (PNN) implemented in our context-speci
c distance-
dependent statistical potential package EPAD. EPAD takes as input sequence
pro
le contexts and mutual information and then yields inter-residue distance
probability distribution. See the EPAD paper [ 5 ] for the technical details.
The EPAD package has been blindly tested in CASP10 for template free modeling.
The CASP10 results show that EPAD can successfully fold some targets with
unusual fold (according to the CASP10 Free Modeling assessor Dr. BK Lee). Our
large-scale experimental test also indicates EPAD is much better than those context-
independent distance-based pairwise potentials such as DOPE [ 19 ], RW [ 20 ] and
DFIRE [ 21 ] in ranking protein decoys [ 5 ].
Search WWH ::




Custom Search