Divide and Conquer Strategies for Protein Structure Prediction - Mathematical Approaches to Polymer Sequence Analysis and Related Problems

Biomedical Engineering Reference

In-Depth Information

Most of the 13 methods are based on NN. The exceptions are PORTER,

SCRATCH, SSPro4 (based on bidirectional recurrent NN), SAM-T99sec (based

on HMM) and Yaspin (based both on NN and HMM).

2.4.2

Secondary Structure Prediction Methods

In this section, we describe in detail two of the most famous secondary structure

prediction methods: PHD 6 and PSIpred. 7 Both methods are based on NN and share

similar network topology. The main difference between the two methods is the way

evolutionary information is extracted from MSA and encoded into the NN input.

Early version of PHD used HSSP pre-computed multiple alignments generated by

MAXHOM. PSIpred uses the position-specific scoring matrix (PSSM) internally

computed by PSI-BLAST. As discussed in [ 41 ], the improvement of PSIpred with

respect to PHD is mostly due to the better alignments used to fed the NN. The

better quality of the alignments is in part due to the growth of the databases and the

filtering strategy used by Jones to avoid pollution of the profile through unrelated

proteins. A more recent version of PHD uses PSSM input and it is called PHDpsi

to distinguish it from the older implementation. The only difference between PHD

and PHDpsi is the use of PSSM input instead of frequency profile input.

Also for all the other secondary structure predictors, the main source of informa-

tion is the sequence profile or the PSSM. The main difference between the different

approaches relies on the technique used to extract knowledge from these two sources

of information. The particular technique is specific to the machine learning method

used. Here we decided to describe only PHD and PSIpred because, historically, they

represent the two most important step-forward in secondary structure prediction.

2.4.2.1

PHD

PHD has been described in [ 42 ]. The PHD method processes the input infor-

mation in two different levels, corresponding to two different neural networks:

(1) sequence-to-structure NN and (2) structure-to-structure NN . The final prediction

is obtained by filtering the solution obtained from consensus between differently

trained neural networks (3).

1. At the first level, the input units of the NN encode local information taken from

sequence profiles (from PSSM in PHDpsi). For each residue position i , the local

information is extracted from a window of 13 adjacent residues centered in i .

For each residue position in the window, 22 input units are used: 20 units en-

code the corresponding column in the sequence profile, 1 unit is used to detect

6 http://www.predictprotein.org/

7 http://bioinf.cs.ucl.ac.uk/psipred/

Search WWH ::

Custom Search

Home