Biomedical Engineering Reference
In-Depth Information
Most of the 13 methods are based on NN. The exceptions are PORTER,
SCRATCH, SSPro4 (based on bidirectional recurrent NN), SAM-T99sec (based
on HMM) and Yaspin (based both on NN and HMM).
2.4.2
Secondary Structure Prediction Methods
In this section, we describe in detail two of the most famous secondary structure
prediction methods: PHD 6 and PSIpred. 7 Both methods are based on NN and share
similar network topology. The main difference between the two methods is the way
evolutionary information is extracted from MSA and encoded into the NN input.
Early version of PHD used HSSP pre-computed multiple alignments generated by
MAXHOM. PSIpred uses the position-specific scoring matrix (PSSM) internally
computed by PSI-BLAST. As discussed in [ 41 ], the improvement of PSIpred with
respect to PHD is mostly due to the better alignments used to fed the NN. The
better quality of the alignments is in part due to the growth of the databases and the
filtering strategy used by Jones to avoid pollution of the profile through unrelated
proteins. A more recent version of PHD uses PSSM input and it is called PHDpsi
to distinguish it from the older implementation. The only difference between PHD
and PHDpsi is the use of PSSM input instead of frequency profile input.
Also for all the other secondary structure predictors, the main source of informa-
tion is the sequence profile or the PSSM. The main difference between the different
approaches relies on the technique used to extract knowledge from these two sources
of information. The particular technique is specific to the machine learning method
used. Here we decided to describe only PHD and PSIpred because, historically, they
represent the two most important step-forward in secondary structure prediction.
2.4.2.1
PHD
PHD has been described in [ 42 ]. The PHD method processes the input infor-
mation in two different levels, corresponding to two different neural networks:
(1) sequence-to-structure NN and (2) structure-to-structure NN . The final prediction
is obtained by filtering the solution obtained from consensus between differently
trained neural networks (3).
1. At the first level, the input units of the NN encode local information taken from
sequence profiles (from PSSM in PHDpsi). For each residue position i , the local
information is extracted from a window of 13 adjacent residues centered in i .
For each residue position in the window, 22 input units are used: 20 units en-
code the corresponding column in the sequence profile, 1 unit is used to detect
6 http://www.predictprotein.org/
7 http://bioinf.cs.ucl.ac.uk/psipred/
Search WWH ::




Custom Search