A Preliminary Study on the Prediction of Human Protein Functions - Foundations on Natural and Artificial Computation

Information Technology Reference

In-Depth Information

Fig. 3. Histogram of average accuracy differences between SVMs and SVMs applied

to datasets with reduced dimensionality

4.2 Tournament Learning

In this series of experiments we focused on the 359 GO terms of the “F” branch.

To apply a tournament learning strategy the key idea was to determine all the

disjointed pairs of GO terms by examining whether proteins lie at the intersec-

tion. Then, each disjointed pair of GO terms can be learned by a binary classifier

such as SVMs.

The average number of disjointed GO term classifiers was 259.4. Thus, for the

tournament learning strategy we defined 46'564 binary classifiers corresponding

to all the disjointed pairs. The union of the training proteins associated to the

359 GO terms represented more than 7'131 proteins. We defined the score of a

predicted GO term as the proportion of classifiers predicting this GO term.

All the SVM predictors were applied to the independent dataset of 44 proteins

to produce a ranked list of predicted functions. Figure 4 illustrates the histogram

of matches with respect to their rank. With the first 50 ranked terms we had 108

matches, corresponding to average recall equal to 51.7% and average precision

equal to 4.9%. Therefore, with respect to the previous experiments we improved

the prediction performance.

The average rank is defined as the average of all the matched positions in the

list of predicted functions. The average of the average ranks was 83.7.

4.3 Multi-label Learning

Again, we focused on the 359 GO terms of the “F” branch. Based on ten repe-

titions we first carried out 10-fold cross-validation trials on the proteins of the

training set. We performed the computations with the Matlab software package

proposed in [16]. The feed-forward neural architectures had 33'102 neurons in the

Search WWH ::

Custom Search

Home