Information Technology Reference
In-Depth Information
In the second series of experiments we used a different training strategy de-
noted as “tournament learning”. On the 359 GO terms of the “F” sub-ontology
we selected disjointed pairs of GO terms (i.e. pairs of GO terms having no pro-
teins in common), then we trained the resulting 46'564 predictors and finally
we tested them on the independent testing set. Finally, in the third series of
experiments we applied the MLL learning scheme to the 359 GO terms of the
“F” sub-ontology.
4.1 One-versus-All Experiments
Within this framework very often the number of negative instances is much
greater than the number of positive instances. In other words, classes with a small
proportion of proteins may be dicult to predict. For that reason we decided to
perform many evaluation trials by varying the proportion of negative examples.
The models we used during the experiments were linear SVMs provided by the
Matlab Bioinformatics toolbox. Learning parameters were set to default values
and results were evaluated on recall (= tp/ ( tp + fn ); tp = true positives; fn =
false negatives), precision (= tp/ ( tp + fp ); fp = false positives) and specificity
(= tn/ ( tn + fp ); tn = true negatives). Figure 1 illustrates average recall, average
precision and average specificity by varying the proportion factor of negative
proteins from 1 to 4. The points of the curves represent average values obtained
after 10 repetitions of 10-fold cross-validation over all the 1930 GO terms.
On 1930 balanced classification problems we compared SVMs with CART
decision trees. We used the Statistic Toolbox of Matlab by setting the learning
parameters with default values. Figure 2 shows the histogram of average accuracy
differences obtained by 10 repetitions of 10 fold cross-validation (for all the
datasets). For SVMs the average of the average accuracies was 78.9%, while for
Fig. 1. From bottom to top : average recall, average precision and average specificity
by varying the proportion factor of negative proteins from 1 to 4 over 1930 GO terms
 
Search WWH ::




Custom Search