Information Technology Reference
In-Depth Information
Fig. 2. Histogram of average accuracy differences between SVMs and CART decision
trees obtained by 10 repetitions of 10 fold cross-validation over 1930 GO terms
CART decision trees we obtained 72.0%. Thus, on the majority of the GO terms
SVMs perform better than CARTs.
We also performed experiments to reduce the dimensionality of the classifica-
tion problems by Principal Component Analysis (PCA). During cross-validation
trials we used balanced datasets with a number of principal components corre-
sponding to the number of training examples. Note that for all the GO terms the
dimensionality of the problem was always greater than the number of training
proteins. The results are shown in Figure 3. For SVMs applied to the datasets
representing the extracted principal components we obtained accuracy equal to
71.9%. Thus, on the majority of the GO terms SVMs perform better than SVMs
applied to a reduced dimensionality problem. By increasing the proportion of
negative proteins in the GO terms we found similar results.
We evaluated the one-versus-all classification strategy on the independent
testing set of 44 proteins. We took into account 359 GO terms of the “F” sub-
ontology. For each GO term we combined 9 SVM classifiers for which we varied
the proportion of negative proteins from 1 to 9. Note that with respect to the
ninth classifiers the first classifier is more sensitive to positive proteins, but will
also give more false positives.
For each protein of the testing set and for each GO term a score was calculated
according to the proportion of classifiers that predicted the presence of each GO
term. By sorting the scores of all the GO terms on a descending order we obtained
a ranked list of predictions. The average recall on the first 50 ranked GO terms
was 39/209 = 18.7% and the precision was 39/(50*44) = 1.8%. Cross-validation
results emphasized much more optimistic recall/precision results. In other words,
for the testing set too many false positives were present in the first positions of
the predicted functions.
Search WWH ::




Custom Search