Information Technology Reference
In-Depth Information
such PSICOV, Evfold, plmDCA [ 2
4 ] as residue co-evolution. PSICOV assumes
-
that P ð X Þ
is a Gaussian distribution function and calculates the correlation between
two columns by inverse covariance matrix. By contrast, plmDCA does not assume
a Gaussian distribution and is more ef
cient and also slightly more accurate.
Generally speaking, these programs are time-consuming.
The reliability of mutual information (MI) or direct information (DI) [ 2 ] depends
on the number of non-redundant sequence homologs. When there are few sequence
homologs, the resulting MI or DI is not very accurate. Therefore, it is not enough to
only use residue co-evolution strength to estimate residue interaction strength. We
can use other contact prediction programs such as PhyCMAP [ 4 ] which integrates
both residue col-evolution information, PSI-BLAST sequence pro
le and others to
predict the probability of two residues in contact. PhyCMAP works much better
than PSICOV and Evfold when proteins under study have a small number of
sequence homologs [ 4 ].
In this work, we use predicted inter-residue Euclidean distance to re
ect inter-
action strength of two residues. This is based upon an assumption that two spatially-
close residues tend to have strong interaction. We predict the inter-residue distance
using sequence information such as mutual
fl
information and its power series,
PSI-BLAST sequence pro
le and other protein features. See [ 5 ] for more details.
Below we brie
y describe how to predict inter-residue distance from sequence
information using probabilistic neural networks (PNN).
We discretize C a C a
fl
distance into 13 bins (3
4, 4
5, 5
6,
,14
15,
-
-
-
-
and >15
). Each bin is also called a label. Given a protein and a pair of two
residues i and j, let d k denote the bin into which their distance falls, and x k denote
the protein feature vector consisting of some position-speci
Å
le
information and also mutual information between two positions. We would like to
estimate the probability of observing d k given the feature vector x k :
c sequence pro
That is, instead
of only considering the most possible distance labels assigned to each pair of nodes
(residues), we would like to estimate the probability distribution of d k :
The reason is
that the predicted distance probability distribution is more informative than a single
predicted value.
Formally, let p h ð
be the probability of the distance label d k conditioned on
the feature vector x k Meanwhile,
d k j
x k Þ
h
is the model parameter vector. We estimate
p h ð
d k j
x k Þ
as follows:
exp
ð
L h ð
d k ;
x k ÞÞ
p h d k j x k
ð
Þ¼
ð 2 : 2 Þ
Z h ð
x k Þ
x ðÞ¼ P d exp
where Z h
is a two-
layer neural network. Figure 2.2 shows an example of the neural network with three
and
ð
L h ð
d
;
x k ÞÞ;
is the partition function and L h ð
d
;
x k Þ
first and second hidden layers, respectively. Each neuron is a
sigmoid function. The function L h ð
five neurons in the
d k ;
x k Þ
can be calculated as,
Search WWH ::




Custom Search