Information Technology Reference
In-Depth Information
solution randomly and then ran the training algorithm on a supercomputer for about
a couple of weeks. Our training algorithm terminated when the probability of either
the training set or the validation set did not improve any more. Note that all the
model parameters are learned from the training set but not the validation set. The
validation set, combined with the training set, is only used to determine when our
training algorithm shall terminate. Our training algorithm usually terminates after
3,000 iterations. We also reran our training algorithm starting from nine initial
solutions and did not observe explicit performance difference among these runs. See
our work on EPAD [ 5 ] for more details.
We use two kinds of input features in this neural network model: PSI-BLAST
sequence pro
le and residue co-evolution. One is context-speci
c sequence pro
le
for a small sequence segment centered at one speci
c residue in question. The
sequence pro
le is generated by running PSI-BLAST on the NR database with
5 iterations and an E-value of 0.001. The other feature we used is residue
co-evolution information. Mutual information is a classical method to measure
residue co-evolution strength. However, mutual information cannot differentiate
direct from indirect interactions. For example, when residue a has strong interaction
with b and b has strong interaction with residue c, it is likely that residue a also has
interaction with c
:
In order to reduce the impact of this kind of indirect information,
some global statistical methods such as Graphical Lasso [ 3 ] and Pseudo-likelihood
[ 8 , 9 ] methods are proposed to estimate residue co-evolution strength. However,
these methods are time-consuming. In this work, to account for chaining effect of
residue coupling, we use the powers of the mutual information matrix. In particular,
let MI denotes the mutual information matrix, we use MI k where k ranges from 2 to
11 to estimate the chaining effect.
2.3 Scoring Similarity of Two Markov Random Fields
This section will introduce how to align two proteins by aligning their corre-
sponding MRFs. As shown in the left picture of Fig. 2.3 , building an alignment is
equivalent to
finding a unique path from the left-top corner to the right-bottom
corner. For each vertex along the path, we need a score to measure how good it is to
transit to the next vertex. That is, we need to measure how similar two nodes of the
two MRFs are. We call this kind of scoring function node alignment potential.
Second, in addition to measure the similarity two aligned MRF nodes, we want to
quantify the similarity between two MRF edges. For example, in the right picture of
Fig. 2.3 residues
L
and
S
of the
first protein are aligned to residues
A
and
of the 2nd protein, respectively. We would like to estimate how good it is to
align the pair (L, S) to the pair (A, Q). This pairwise similarity function is a function
of two MRF edges and we call it edge alignment potential. When the edge align-
ment potential is used to score the similarity of two MRFs, Viterbi algorithm or
simple dynamic programming cannot be used to
Q
find the optimal alignment. It can
Search WWH ::




Custom Search