Information Technology Reference
In-Depth Information
Protein Secondary Structure Prediction:
Comparison of Ten Common Prediction
Algorithms Using a Neural Network
Jorn R. DE HAAN 1 and Jack A.M. LEUNISSEN 2
1 Laboratory of Analytical Chemistry, Radboud University Nijmegen, Toernooiveld, 6525
ED Nijmegen, the Netherland, and 2 Laboratory of Bioinformatics, Wageningen University,
Dreijenlaan 3, 6703 HA Wageningen, the Netherlands
Abstract. Protein secondary structure prediction is believed to improve by
combining different predictions into a consensus secondary structure prediction. Ten
different protein secondary structure prediction programs were compared and given
weights by a feed forward neural network. A dataset of approximately 6000 proteins
was taken from the DSSP database and was used to train the neural network. The
resulting weights indicate that the secondary structure prediction programs PHD and
Predator performed better than the other methods. However training of the neural
network with a smaller but more stringently selected dataset did not support these
results for the Predator program. The performance of the program PHD remained
the same when the smaller dataset was used to train the neural network.
1. Introduction
1.1. Secondary structure prediction
The “Holy Grail” in bioinformatics for years was (and still is) the ab initio prediction of
protein 3D structure, i.e. constructing the folding structure of a protein based upon the
amino acid sequence alone. One important step to attaining this goal is the prediction of
protein secondary structure from the primary structure. Several methods have been
developed to make and improve secondary structure predictions for proteins; these are
amongst the oldest algorithms used in bioinformatics, the oldest ones dating back to the
early seventies (e.g. Chou & Fasman, 1974, Lim 1974, Garnier 1978) [1-5]. Improvement
of secondary structure prediction is relevant and interesting because secondary structure
predictions allow for a wide variety of conclusions on the fold classification and function of
a protein and, in particular, provide important information for 3D-structure prediction [6].
Furthermore the results of secondary structure prediction have been an aid for designing
new proteins [7], predicting the effect of point mutations, identifying the protein class, for
instance, all-D or all-E proteins, and predicting epitopes [8]. In this report we research the
possibilities of combining predictions from secondary structure prediction methods to form
a consensus prediction. The goal of a consensus method is to improve the final prediction
result in comparison with the individual predictions.
The field of secondary structure prediction for proteins can be divided in two ways
of predicting. First there is secondary structure class prediction, in which a protein is
characterized as an all D-, all E- or D/E class protein. Second there is 'normal' secondary
structure prediction, in which the secondary structure state (D-helix, E-sheet or other) is
Search WWH ::




Custom Search