Divide and Conquer Strategies for Protein Structure Prediction - Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Biomedical Engineering Reference

In-Depth Information

2.4.2.2

PSIpred

PSIpred has been described in [ 21 ]. The original implementation is based on neural

networks. An almost equivalent implementation with SVM has been described in

[ 48 ] and compared with the original version.

The neural network topology of PSIpred is very similar to the one used in PHD:

in both methods the input is processed in two different levels, and the final result

is obtained as the consensus between differently trained networks. The main dif-

ferences are the lengths of the windows used in the first and second levels: in both

networks PSIpred uses 15-residue long windows, while PHD uses lengths 13 and 17,

respectively. Moreover, the conservation weight is not included in the input of

PSIpred (it showed poor improvement also in PHD [ 42 ]). The most important differ-

ence between early PHD version and PSIpred is the way evolutionary information

is treated. In particular, the position-specific scoring matrix (PSSM) is used to fed

the NN instead of the classical frequency profile computed from MSA.

Here we review in detail the procedure used by Jones to produce meaningful

position-specific profiles with PSI-BLAST, as described in [ 21 ]. Although PSI-

BLAST is much more sensitive than BLAST in picking up distant evolutionary

relationships, it must be used carefully in order to avoid false-positive matches. In

particular, PSI-BLAST is very prone to incorporate repetitive sequences into the in-

termediate profiles. When this happens, the searching process tends to find highly

scored matches with completely random sequences. In order to maximise the perfor-

mances of PSI-BLAST, Jones builds a custom sequence data bank by first compiling

a large set of non-redundant protein sequences and then by filtering the databank in

order to remove low complexity regions [ 49 ], transmembrane segments [ 22 ]and

regions which are likely to form coiled-coil regions (these filtering are now auto-

matically performed by PSI-BLAST).

Finally, the input of the NN is computed from the PSSM of PSI-BLAST af-

ter three iterations, scaled to values between 0 and 1 with the logistic function

1=.1 C e x /,wherex is the raw profile value.

2.5

Residue-Residue Contact Prediction

Residue-residue contact prediction refers to the prediction of the probability that

two residues in a protein structure are spatially close to each other. Inter-residue

contacts provide much information about the protein structure. A contact between

two residues that are distant in the protein sequence can be seen as a strong con-

straint on the protein fold. If we could predict with high precision even a small

set of (non-trivial) residue pairs in contact, we could use this information as extra

constraints to guide the protein structure prediction. The prediction of inter-residue

contact is a difficult problem, and no satisfactory improvements have been achieved

in the last 10 years of investigation. On the other end, even if residue contact pre-

dictors are highly inaccurate, they still have higher accuracy compared to contact

predictions derived from the best 3D structure prediction methods [ 45 ].

Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Search WWH ::

Custom Search

Home