Information Technology Reference
In-Depth Information
such PSICOV, Evfold, plmDCA [
2
4
] as residue co-evolution. PSICOV assumes
-
that P
ð
X
Þ
is a Gaussian distribution function and calculates the correlation between
two columns by inverse covariance matrix. By contrast, plmDCA does not assume
a Gaussian distribution and is more ef
cient and also slightly more accurate.
Generally speaking, these programs are time-consuming.
The reliability of mutual information (MI) or direct information (DI) [
2
] depends
on the number of non-redundant sequence homologs. When there are few sequence
homologs, the resulting MI or DI is not very accurate. Therefore, it is not enough to
only use residue co-evolution strength to estimate residue interaction strength. We
can use other contact prediction programs such as PhyCMAP [
4
] which integrates
both residue col-evolution information, PSI-BLAST sequence pro
le and others to
predict the probability of two residues in contact. PhyCMAP works much better
than PSICOV and Evfold when proteins under study have a small number of
sequence homologs [
4
].
In this work, we use predicted inter-residue Euclidean distance to re
ect inter-
action strength of two residues. This is based upon an assumption that two spatially-
close residues tend to have strong interaction. We predict the inter-residue distance
using sequence information such as mutual
fl
information and its power series,
PSI-BLAST sequence pro
le and other protein features. See [
5
] for more details.
Below we brie
y describe how to predict inter-residue distance from sequence
information using probabilistic neural networks (PNN).
We discretize C
a
C
a
fl
distance into 13 bins (3
4, 4
5, 5
6,
…
,14
15,
-
-
-
-
and >15
). Each bin is also called a label. Given a protein and a pair of two
residues i and j, let d
k
denote the bin into which their distance falls, and x
k
denote
the protein feature vector consisting of some position-speci
Å
le
information and also mutual information between two positions. We would like to
estimate the probability of observing d
k
given the feature vector x
k
:
c sequence pro
That is, instead
of only considering the most possible distance labels assigned to each pair of nodes
(residues), we would like to estimate the probability distribution of d
k
:
The reason is
that the predicted distance probability distribution is more informative than a single
predicted value.
Formally, let p
h
ð
be the probability of the distance label d
k
conditioned on
the feature vector x
k
Meanwhile,
d
k
j
x
k
Þ
h
is the model parameter vector. We estimate
p
h
ð
d
k
j
x
k
Þ
as follows:
exp
ð
L
h
ð
d
k
;
x
k
ÞÞ
p
h
d
k
j
x
k
ð
Þ¼
ð
2
:
2
Þ
Z
h
ð
x
k
Þ
x
ðÞ¼
P
d
exp
where Z
h
is a two-
layer neural network. Figure
2.2
shows an example of the neural network with three
and
ð
L
h
ð
d
;
x
k
ÞÞ;
is the partition function and L
h
ð
d
;
x
k
Þ
first and second hidden layers, respectively. Each neuron is a
sigmoid function. The function L
h
ð
five neurons in the
d
k
;
x
k
Þ
can be calculated as,
Search WWH ::
Custom Search