Biomedical Engineering Reference
In-Depth Information
2.4.2.2
PSIpred
PSIpred has been described in [ 21 ]. The original implementation is based on neural
networks. An almost equivalent implementation with SVM has been described in
[ 48 ] and compared with the original version.
The neural network topology of PSIpred is very similar to the one used in PHD:
in both methods the input is processed in two different levels, and the final result
is obtained as the consensus between differently trained networks. The main dif-
ferences are the lengths of the windows used in the first and second levels: in both
networks PSIpred uses 15-residue long windows, while PHD uses lengths 13 and 17,
respectively. Moreover, the conservation weight is not included in the input of
PSIpred (it showed poor improvement also in PHD [ 42 ]). The most important differ-
ence between early PHD version and PSIpred is the way evolutionary information
is treated. In particular, the position-specific scoring matrix (PSSM) is used to fed
the NN instead of the classical frequency profile computed from MSA.
Here we review in detail the procedure used by Jones to produce meaningful
position-specific profiles with PSI-BLAST, as described in [ 21 ]. Although PSI-
BLAST is much more sensitive than BLAST in picking up distant evolutionary
relationships, it must be used carefully in order to avoid false-positive matches. In
particular, PSI-BLAST is very prone to incorporate repetitive sequences into the in-
termediate profiles. When this happens, the searching process tends to find highly
scored matches with completely random sequences. In order to maximise the perfor-
mances of PSI-BLAST, Jones builds a custom sequence data bank by first compiling
a large set of non-redundant protein sequences and then by filtering the databank in
order to remove low complexity regions [ 49 ], transmembrane segments [ 22 ]and
regions which are likely to form coiled-coil regions (these filtering are now auto-
matically performed by PSI-BLAST).
Finally, the input of the NN is computed from the PSSM of PSI-BLAST af-
ter three iterations, scaled to values between 0 and 1 with the logistic function
1=.1 C e x /,wherex is the raw profile value.
2.5
Residue-Residue Contact Prediction
Residue-residue contact prediction refers to the prediction of the probability that
two residues in a protein structure are spatially close to each other. Inter-residue
contacts provide much information about the protein structure. A contact between
two residues that are distant in the protein sequence can be seen as a strong con-
straint on the protein fold. If we could predict with high precision even a small
set of (non-trivial) residue pairs in contact, we could use this information as extra
constraints to guide the protein structure prediction. The prediction of inter-residue
contact is a difficult problem, and no satisfactory improvements have been achieved
in the last 10 years of investigation. On the other end, even if residue contact pre-
dictors are highly inaccurate, they still have higher accuracy compared to contact
predictions derived from the best 3D structure prediction methods [ 45 ].
Search WWH ::




Custom Search