Protein Secondary Structure Prediction: Comparison of Ten Common Prediction Algorithms Using a Neural Network - Essays in Bioinformatics

Information Technology Reference

In-Depth Information

x NNpredict [21] is a program that predicts the secondary structure type for each residue

in an amino acid sequence by using a two-layer, feed-forward neural network.

Examples of hybrid methods are the programs PHD and PSIPRED. The program PHD

[17-19] uses a combination of multiple alignment and several cascading neural networks.

The program may generate its own alignment with the submitted sequence and is composed

of several cascading neural networks (previously trained on proteins of known structures).

PSIPRED [10] incorporates two feed-forward neural networks, which perform an analysis

on output obtained from PSI-BLAST [22]. PHD and PSIPRED are currently considered to

be amongst the best performing methods. They are both hybrid methods and this suggests

that it could be profitable to combine principles than to use one method [10,14].

1.3. Consensus secondary structure predictions

Different ways of combining prediction principles into a hybrid secondary structure

prediction program are known. There is a "standard approach" in which the most

appropriate strategy (or principle) is applied to a specific task. The predicting problem has

to be broken down into different tasks. For each task the best strategy is used to improve

the results. Another approach is "ensemble learning". Here the focus is on a single

prediction task and multiple predictors or classifiers are built for that task. The different

predictors are combined either by voting or by training a classifier to combine them.

A consensus method is using the last principle of ensemble learning to improve the

prediction results. The results of several secondary structure prediction programs can be

compared and combined by a classifier. Therefore in case of a secondary prediction

consensus method the multiple predictors are already built and predictions can be used to

make a consensus predicted sequence.

As mentioned before a consensus method looks at the results of several different

prediction programs. In order to choose when to use the results of which program(s) a

decision mechanism or classifier has to be implemented in the method. Three of those

consensus method classifiers are discussed below, i.e. decision tree, majority wins (winner

takes it all), and neural network.

A decision tree is a representation of a decision procedure in order to attain

classification for a given example [6]. At each node of the tree, there is a question, and a

branch corresponding to each of the possible outcomes of this question. At each leaf node,

there is a classification. Decision trees have many uses, particularly for solving problems

that can be formulated in terms of producing a single answer in the form of a class name.

Decision trees are constructed from examples that are already labelled. Decision trees could

be used to apply rules for determination of secondary structure for a specific residue. In fact

the next classifier could be viewed as a very short decision tree with few questions.

The consensus program JPRED [23,33,34] uses the majority wins principle. Despite

all the efforts and different methods, the Q 3 (percentage of correct prediction) of protein

secondary structure prediction for all the methods mentioned before is 60 to 80 percent. The

makers of a consensus secondary structure server called JPRED aimed to improve this

percentage by combining six different secondary prediction programs like the ones

mentioned before. The server is available through a web-interface and no neural network is

used in making the consensus prediction. JPRED builds a consensus prediction by

comparing the results of these programs and JPRED takes the predicted state, which is most

abundant. The majority wins and therefore this principle is also called the "winner takes it

all method". The correct prediction of protein secondary structure of JPRED is 72.9

percent.

Essays in Bioinformatics

Search WWH ::

Custom Search

Home