Information Technology Reference
In-Depth Information
5 Discussion and Conclusion
We proposed to tackle the problem of human protein function prediction by
three distinct supervised learning schemes : one-versus-all classification; tour-
nament learning; multi-label learning (MLL). With respect to cross-validation
trials performed on the training set the independent testing set was much more
dicult to predict. The main reason probably resides in the fact that the training
dataset encompasses a significant proportion of protein molecular functions that
can be deduced directly from the presence of InterPro domains and PROSITE
patterns/profiles.
We are not yet ready to predict dicult cases with high precision, as average
precision on the di cult independent set was below 5%. However, on this dataset
the average recall measure reached a reasonable performance for the first 50
ranked predictions, especially for the tournament learning scheme.
In the future we will try to improve the results by combining tournament
learning and multi-label learning. Finally, rule extraction will also play an im-
portant role, as a major concern for Biologists is to understand how a machine
learning model arrives at a particular prediction, especially when there is no
explicit or simple relationship between inputs and outputs.
Acknowledgements
The authors gratefully thank SystemsX.ch for having funded this work un-
der grant IPP 2009 41. All the computations were performed at the Vital-IT
(http://www.vital-it.ch) Center for high-performance computing of the Swiss
Institute of Bioinformatics.
References
1. Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., De Smet, F.,
Tranchevent, L.C., De Moor, B., Marynen, P., Hassan, B., et al.: Gene prioritization
through genomic data fusion. Nat. Biotechnol. 24(5), 537-544 (2006)
2. Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label pre-
diction of gene function. Bioinformatics 22(7), 830-836 (2006)
3. Breiman, L., Friedman, J., Olshen, R.A., Stone, C.J.: Classification and regression
trees. Wadsworth, Belmont (1984)
4. Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X., Chen, Y.Z.: SVM-Prot: web based support
vector machine software for functional classification of a protein from its primary
sequence. Nucleic Acids Research 31(13), 3692-3697 (2003)
5. Eisenberg, D., Schwarz, E., Komaromy, M., Wall, R.: Analysis of membrane and
surface protein sequences with the hydrophobic moment plot. J. Mol. Biol. 179(1),
125-142 (1984)
6. Hu, L., Huang, T., Shi, X., Lu, W.C., Cai, Y.D., Chou, K.C.: Predicting functions
of proteins in mouse based on weighted protein-protein interaction network and
protein hybrid properties. PLoS One 6(1), e14556 (2011)
Search WWH ::




Custom Search