Protein Secondary Structure Prediction: Comparison of Ten Common Prediction Algorithms Using a Neural Network - Essays in Bioinformatics

Information Technology Reference

In-Depth Information

predicted for each residue of a protein. In this study the predictions of computer programs,

or methods, of the 'normal' secondary structure prediction were used. It is therefore

noteworthy to mention that experiments were done to look into the difference in prediction

of D-helix or E-sheet between secondary structure prediction methods. When prediction of

D-helix or prediction of E-sheet are mentioned further down one should remind that this

means 'normal' secondary structure prediction per residue and not protein class prediction.

1.2. Methods in secondary structure prediction

Methods in protein secondary structure prediction are designed and work on the basis of

different underlying prediction principles. Some of these principles and methods using this

particular principle are mentioned here below in no particular order: statistical analysis

[1,2]; simple linear statistics, information theory [5,8,9]; neural networks and machine

learning [10,11]; k-way nearest neighbour [12,13]; linear discrimination [14]; hydrogen

bonding propensities [15]; conservation number weighted prediction [16]; and hybrid

methods, a combination of principles [17-19].

In the section below we will briefly describe the main characteristics of these

algorithms:

x The Chou-Fasman method uses statistical analysis to predict secondary structure [1]. In

the first implementation of this method only 15 proteins of known 3D structure were

analysed and residues were assigned according to their ability to initiate or terminate

particular secondary structure elements. Residues were classified into strong formers,

weak formers, formers, indifferent formers, strong breakers and breakers. In later

updates of the algorithm a more elaborate database of protein tertiary structures were

used [2].

x The Garnier method [5] uses simple linear statistics and information theory to make

secondary structure predictions . Besides information theory the algorithm, like Chou-

Fasman, uses statistical data extracted from structural databases. Furthermore Garnier

also takes into account the accuracy of the data: the likelihood for each residue and

neighbouring residues to be in a certain conformation was obtained by examining data

collected from 8 residues on either side of each amino acid residue. This way a protein

can be scanned with a 17 residues long window, which predicts the likelihood of each

residue to assume a specific secondary structure. The algorithm has seen several

revisions, GOR4 being the fourth and more recent version of the Garnier secondary

structure prediction method, based on information theory [8]. In this algorithm the

prediction of beta turns and random coil structure have been abandoned.

x The program DSC (Discrimination of protein Secondary structure Class) of King &

Sternberg [14] combines several secondary structure prediction principles. DSC applies

Garnier residue attributes, amino acid hydrophobicity values and amino acid positional

information. Also information from a multiple sequence alignment is used to perform

the secondary structure prediction. Simple and linear statistical methods are applied to

filter the different prediction concepts and to remove false predictions.

x PREDATOR2 [15] is a secondary structure prediction method, which predicts

secondary structure on the basis of hydrogen bonding propensities and non-local

interaction statistics. These propensities were calculated for each of the possibly 400

amino-acid pairs. Furthermore local pairwise alignments are used to incorporate

information from homologous proteins.

x SIMPA96 [20] is a nearest neighbour secondary structure prediction method, which

uses a similarity matrix, similarity threshold and information from a database of known

secondary structures.

Essays in Bioinformatics

Search WWH ::

Custom Search

Home