Protein Secondary Structure Prediction: Comparison of Ten Common Prediction Algorithms Using a Neural Network - Essays in Bioinformatics

Information Technology Reference

In-Depth Information

1.4. Evaluation of prediction results

In order to compare the results of different secondary structure programs an objective score

of prediction accuracy is required. The most used index is the three-state per-residue

accuracy (Q 3 ). The formula below gives the percentage of residues predicted correctly for

D-helix (q a ), E-strand (q b ) and other (q c ) of the total number of residues (N).

q

a

E

c

Q 3 =

x 100% = percentage correctly predicted residues

N

A closer look at the Q 3 value learns that it is not very convenient when the target

class is present in only a relative small part of the data. This is because in that case correct

prediction of the non-regular class tends to dominate the three-state accuracy. A more

precise method avoiding this is the Matthew Correlation Coefficient (C) [24], which is

defined by the formula shown below.

t

f

p

n

p

n

C=

(

t

f

)(

t

f

)(

t

f

)(

t

f

)

p

n

p

n

The value of the Matthews Correlation Coefficient is between 1 and 0 and can be

calculated from the number of true positive- (t p ), true negative- (t n ), false positive- (f p ) and

false negative predicted residues (f n ).

1.5. Neural Networks

To address the function of a consensus method classifier the decision tree principle and

"majority wins" principle seem fairly simple and crude principles. A neural network could

be a more complex and possibly better classifier in a consensus secondary prediction

method. More information from the methods could be used by a neural network to

determine when to use what method. But what is a neural network?

A definition in the DARPA Neural Network Study [25] states "… a neural network

is a system composed of many simple processing elements operating in parallel whose

function is determined by network structure, connection strengths, and the processing

performed at computing elements or nodes". Another slightly more recent definition reads

"artificial neural systems, or neural networks, are physical cellular systems which can

acquire, store, and utilise experiential knowledge" [26].

As mentioned in both definitions, a neural network consists of computing units

(processing elements, nodes or cells). These units can be grouped in layers, an input layer, a

variable number of hidden layers and an output layer. These layers can be interconnected

(see figure 1). Each unit receives input, which is transformed by a transfer function to

output. Biases can be imported in these transfer functions. The output can be conducted to a

next computing unit or several units. Thus the connected units form a network. Each

connection between units has a weight attached to it. Building and programming the units

in different conformations can make various types of neural networks. These types listen to

illustrious names like Bolzmann machine, Hebbian network and Hopfield network. An

Essays in Bioinformatics

Search WWH ::

Custom Search

Home