Information Technology Reference
In-Depth Information
2.3. Secondary structure prediction programs
The format of the input files for the secondary structure prediction methods was changed to
the different acquired file types. Predictions of eight out of the ten secondary structure
prediction programs for proteins could be done locally on a Silicon Graphics Origin2000
computer at the CMBI in Nijmegen.
The program Pepplot [30] was used to make Chou-Fasman [1,2] predictions.
Predictions this method made by Pepplot will be referred to as Chou-Fasman predictions.
Slightly adapted Chou-Fasman predictions are produced by the program PeptideStructure
[31,32]. These Chou-Fasman predictions will be referred to as CFpred from this point.
PeptideStructure uses a modified version of the previously mentioned method of Chou and
Fasman: for D-helix predictions not all conditions are used, and for E-sheet predictions a
minimum length of five residues is obligatory.
PeptideStructure also predicts secondary structure according to a modified version
of the Garnier prediction method [5]. Predictions from this method will be referred to as
Garnier predictions. The alterations to the Garnier method by PeptideStructure consist of
the following rules: the minimum length of a helix is six and of a beta-sheet is four, and
regions without adequate predictions are replaced by the conformational state of the next
best probability.
Secondary structure predictions by more recent version of the Garnier secondary
structure prediction method were performed using the program GOR4 [8].
The program DSC (Discrimination of protein Secondary structure Class) combines
several secondary structure prediction principles [14]. From the output file of DSC the
program SecCons (see below) extracts another secondary structure prediction, which uses
slightly different rules. This prediction is called DSC-l to distinguish it from the normal
DSC prediction.
PREDATOR2 [15] is a secondary structure prediction method, which predicts
secondary structure on the basis of hydrogen bonding propensities and non-local interaction
statistics. These propensities were calculated for each of the possibly 400 amino-acid pairs.
Furthermore local pairwise alignments are used to incorporate information from
homologous proteins.
SIMPA96 [20] is a nearest neighbour secondary structure prediction method, which
uses a similarity matrix, similarity threshold and information from a database of known
secondary structures.
The predictions of the two remaining secondary prediction programs were obtained
by making use of e-mail or HTML servers. NNpredict [21] is available via the NNpredict
web server. Sequences were submitted to the server and the retrieved HTML files were
later processed.
From the PredictProtein web and e-mail server predictions of the aforementioned
PHD program [17-19] were obtained. An e-mail message containing the protein sequence
and name was sent to this server, which returned an e-mail with the secondary structure
prediction.
2.4. Converting secondary structure predictions to Neural Network input
Every program returned its predictions in a distinct file format. In order to use the
predictions and the verified secondary structure as input for a neural network all the
prediction files for a certain protein were gathered by the program SecCons (JAML,
Search WWH ::




Custom Search