Information Technology Reference
In-Depth Information
provided to the predictors correspond to terms of a Gene Ontology (GO) [12].
GO provides a structured description of the possible functions of genes or pro-
teins and is classically used as a benchmark for functional prediction.
In [7] neural networks and SVMs were proposed to predict human protein
functions related to 14 GO terms. Based on five-fold cross-validation trials the
average recall was 50% and the average specificity above 90%. In [16] multi-label
learning was applied to the classification of yeast gene function prediction with
target classes defined by GO terms. With the use of neural networks the average
precision during cross-validation trials was 75.6%. Vens et al. adapted decision
trees to the multi-label learning scheme on yeast gene function prediction [15].
Target classes were defined by the FunCat hierarchy classification [9] and GO
terms. They concluded that learning one single model for all classes gave bet-
ter average accuracy than learning classes, independently. Based on the nearest
neighbour classifier and 24 functional classes, the authors of [6] obtained aver-
age predictive accuracy for mouse gene function prediction equal to 69.1%. In
[10] a standardised collection of mouse functional genomic datasets were assem-
bled. Nine bioinformatics teams independently trained classifiers and generated
predictions of GO terms. The conclusion was that at a recall rate of 20%, a
unified set of predictions averaged 41% precision. On yeast, based on support
vector machines predictors and a Bayesian network the average recall and the
average precision obtained on an independent dataset were 7%, and 51%, re-
spectively on GO terms [2]. In [4] the SVMProt predictor was presented. The
authors focused on a number of protein classes collected from several databases
that encompassed all major classes of enzymes, receptors, transporters, chan-
nels, DNA-binding proteins and RNA-binding proteins. The obtained accuracy
for protein family classification was found to be in the range of 69.1%-99.6%.
To the best of our knowledge no other work has explored the prediction of
human protein functions with the use of GO targets learned by tournament
learning (also denoted as one-versus-one learning) and multi-label learning [13].
On cross validation trials performed with the use of multi-label learning applied
to neural networks the average precision was 63.4%. With an independent dataset
including very dicult cases the recall measure reached a reasonable performance
for the first 50 ranked predictions, on average; however, average precision was
quite low. In the following sections we introduce the methods applied to predict
protein functions then we describe three series of experiments, followed by a
conclusion.
2 Models and Methods
In this work we compare several learning models : decision trees [3], SVMs [14],
and neural networks trained by multi-label learning [16]. As SVMs are binary clas-
sifiers, a classification problem of K classes (with K> 2) is transformed into K
binary classification problems learned by K classifiers. For each classifier there is
a positive class and a negative class encompassing the other K
1 classes. This
classification scheme is denoted as one-versus-all classification. In tournament
 
Search WWH ::




Custom Search