Information Technology Reference
In-Depth Information
predicted for each residue of a protein. In this study the predictions of computer programs,
or methods, of the 'normal' secondary structure prediction were used. It is therefore
noteworthy to mention that experiments were done to look into the difference in prediction
of D-helix or E-sheet between secondary structure prediction methods. When prediction of
D-helix or prediction of E-sheet are mentioned further down one should remind that this
means 'normal' secondary structure prediction per residue and not protein class prediction.
1.2. Methods in secondary structure prediction
Methods in protein secondary structure prediction are designed and work on the basis of
different underlying prediction principles. Some of these principles and methods using this
particular principle are mentioned here below in no particular order: statistical analysis
[1,2]; simple linear statistics, information theory [5,8,9]; neural networks and machine
learning [10,11]; k-way nearest neighbour [12,13]; linear discrimination [14]; hydrogen
bonding propensities [15]; conservation number weighted prediction [16]; and hybrid
methods, a combination of principles [17-19].
In the section below we will briefly describe the main characteristics of these
algorithms:
x The Chou-Fasman method uses statistical analysis to predict secondary structure [1]. In
the first implementation of this method only 15 proteins of known 3D structure were
analysed and residues were assigned according to their ability to initiate or terminate
particular secondary structure elements. Residues were classified into strong formers,
weak formers, formers, indifferent formers, strong breakers and breakers. In later
updates of the algorithm a more elaborate database of protein tertiary structures were
used [2].
x The Garnier method [5] uses simple linear statistics and information theory to make
secondary structure predictions . Besides information theory the algorithm, like Chou-
Fasman, uses statistical data extracted from structural databases. Furthermore Garnier
also takes into account the accuracy of the data: the likelihood for each residue and
neighbouring residues to be in a certain conformation was obtained by examining data
collected from 8 residues on either side of each amino acid residue. This way a protein
can be scanned with a 17 residues long window, which predicts the likelihood of each
residue to assume a specific secondary structure. The algorithm has seen several
revisions, GOR4 being the fourth and more recent version of the Garnier secondary
structure prediction method, based on information theory [8]. In this algorithm the
prediction of beta turns and random coil structure have been abandoned.
x The program DSC (Discrimination of protein Secondary structure Class) of King &
Sternberg [14] combines several secondary structure prediction principles. DSC applies
Garnier residue attributes, amino acid hydrophobicity values and amino acid positional
information. Also information from a multiple sequence alignment is used to perform
the secondary structure prediction. Simple and linear statistical methods are applied to
filter the different prediction concepts and to remove false predictions.
x PREDATOR2 [15] is a secondary structure prediction method, which predicts
secondary structure on the basis of hydrogen bonding propensities and non-local
interaction statistics. These propensities were calculated for each of the possibly 400
amino-acid pairs. Furthermore local pairwise alignments are used to incorporate
information from homologous proteins.
x SIMPA96 [20] is a nearest neighbour secondary structure prediction method, which
uses a similarity matrix, similarity threshold and information from a database of known
secondary structures.
Search WWH ::




Custom Search