Protein Secondary Structure Prediction: Comparison of Ten Common Prediction Algorithms Using a Neural Network - Essays in Bioinformatics

Information Technology Reference

In-Depth Information

Protein Secondary Structure Prediction:

Comparison of Ten Common Prediction

Algorithms Using a Neural Network

Jorn R. DE HAAN 1 and Jack A.M. LEUNISSEN 2

1 Laboratory of Analytical Chemistry, Radboud University Nijmegen, Toernooiveld, 6525

ED Nijmegen, the Netherland, and 2 Laboratory of Bioinformatics, Wageningen University,

Dreijenlaan 3, 6703 HA Wageningen, the Netherlands

Abstract. Protein secondary structure prediction is believed to improve by

combining different predictions into a consensus secondary structure prediction. Ten

different protein secondary structure prediction programs were compared and given

weights by a feed forward neural network. A dataset of approximately 6000 proteins

was taken from the DSSP database and was used to train the neural network. The

resulting weights indicate that the secondary structure prediction programs PHD and

Predator performed better than the other methods. However training of the neural

network with a smaller but more stringently selected dataset did not support these

results for the Predator program. The performance of the program PHD remained

the same when the smaller dataset was used to train the neural network.

1. Introduction

1.1. Secondary structure prediction

The “Holy Grail” in bioinformatics for years was (and still is) the ab initio prediction of

protein 3D structure, i.e. constructing the folding structure of a protein based upon the

amino acid sequence alone. One important step to attaining this goal is the prediction of

protein secondary structure from the primary structure. Several methods have been

developed to make and improve secondary structure predictions for proteins; these are

amongst the oldest algorithms used in bioinformatics, the oldest ones dating back to the

early seventies (e.g. Chou & Fasman, 1974, Lim 1974, Garnier 1978) [1-5]. Improvement

of secondary structure prediction is relevant and interesting because secondary structure

predictions allow for a wide variety of conclusions on the fold classification and function of

a protein and, in particular, provide important information for 3D-structure prediction [6].

Furthermore the results of secondary structure prediction have been an aid for designing

new proteins [7], predicting the effect of point mutations, identifying the protein class, for

instance, all-D or all-E proteins, and predicting epitopes [8]. In this report we research the

possibilities of combining predictions from secondary structure prediction methods to form

a consensus prediction. The goal of a consensus method is to improve the final prediction

result in comparison with the individual predictions.

The field of secondary structure prediction for proteins can be divided in two ways

of predicting. First there is secondary structure class prediction, in which a protein is

characterized as an all D-, all E- or D/E class protein. Second there is 'normal' secondary

structure prediction, in which the secondary structure state (D-helix, E-sheet or other) is

Search WWH ::

Custom Search

Home