A Preliminary Study on the Prediction of Human Protein Functions - Foundations on Natural and Artificial Computation

Information Technology Reference

In-Depth Information

provided to the predictors correspond to terms of a Gene Ontology (GO) [12].

GO provides a structured description of the possible functions of genes or pro-

teins and is classically used as a benchmark for functional prediction.

In [7] neural networks and SVMs were proposed to predict human protein

functions related to 14 GO terms. Based on five-fold cross-validation trials the

average recall was 50% and the average specificity above 90%. In [16] multi-label

learning was applied to the classification of yeast gene function prediction with

target classes defined by GO terms. With the use of neural networks the average

precision during cross-validation trials was 75.6%. Vens et al. adapted decision

trees to the multi-label learning scheme on yeast gene function prediction [15].

Target classes were defined by the FunCat hierarchy classification [9] and GO

terms. They concluded that learning one single model for all classes gave bet-

ter average accuracy than learning classes, independently. Based on the nearest

neighbour classifier and 24 functional classes, the authors of [6] obtained aver-

age predictive accuracy for mouse gene function prediction equal to 69.1%. In

[10] a standardised collection of mouse functional genomic datasets were assem-

bled. Nine bioinformatics teams independently trained classifiers and generated

predictions of GO terms. The conclusion was that at a recall rate of 20%, a

unified set of predictions averaged 41% precision. On yeast, based on support

vector machines predictors and a Bayesian network the average recall and the

average precision obtained on an independent dataset were 7%, and 51%, re-

spectively on GO terms [2]. In [4] the SVMProt predictor was presented. The

authors focused on a number of protein classes collected from several databases

that encompassed all major classes of enzymes, receptors, transporters, chan-

nels, DNA-binding proteins and RNA-binding proteins. The obtained accuracy

for protein family classification was found to be in the range of 69.1%-99.6%.

To the best of our knowledge no other work has explored the prediction of

human protein functions with the use of GO targets learned by tournament

learning (also denoted as one-versus-one learning) and multi-label learning [13].

On cross validation trials performed with the use of multi-label learning applied

to neural networks the average precision was 63.4%. With an independent dataset

including very dicult cases the recall measure reached a reasonable performance

for the first 50 ranked predictions, on average; however, average precision was

quite low. In the following sections we introduce the methods applied to predict

protein functions then we describe three series of experiments, followed by a

conclusion.

2 Models and Methods

In this work we compare several learning models : decision trees [3], SVMs [14],

and neural networks trained by multi-label learning [16]. As SVMs are binary clas-

sifiers, a classification problem of K classes (with K> 2) is transformed into K

binary classification problems learned by K classifiers. For each classifier there is

a positive class and a negative class encompassing the other K

1 classes. This

classification scheme is denoted as one-versus-all classification. In tournament

−

Foundations on Natural and Artificial Computation

Search WWH ::

Custom Search

Home