Information Technology Reference
In-Depth Information
A Preliminary Study on the Prediction of
Human Protein Functions
Guido Bologna 1 , Anne-Lise Veuthey 2 , Marco Pagni 3 ,
Lydie Lane 1 , 4 ,andAmosBairoch 1 , 4
1 CALIPHO Group, Swiss Institute of Bioinformartics
Rue Michel Servet 1, 1211 Geneva 4, Switzerland
Guido.Bologna@isb-sib.ch, Lydie.Lane@isb-sib.ch
2 Swiss-Prot Group, Swiss Institute of Bioinformartics
Rue Michel Servet 1, 1211 Geneva 4, Switzerland
Anne-Lise.Veuthey@isb-sib.ch
3 Vital-IT Group, Swiss Institute of Bioinformartics
Quartier Sorge - Genopode, 1015, Switzerland
Marco.Pagni@isb-sib.ch
4 Department of Structural Biology and Bioinformatics, University of Geneva
Rue Michel Servet 1, 1211 Geneva 4, Switzerland
Amos.Bairoch@unige.ch
Abstract. In the human proteome, about 5'000 proteins lack experi-
mentally validated functional information. In this work we propose to
tackle the problem of human protein function prediction by three dis-
tinct supervised learning schemes: one-versus-all classification; tourna-
ment learning; multi-label learning. Target values of supervised learning
models are represented by the nodes of a subset of the Gene Ontology,
which is widely used as a benchmark for functional prediction. With
an independent dataset including very dicult cases the recall measure
reached a reasonable performance for the first 50 ranked predictions, on
average; however, average precision was quite low.
1
Introduction
The recent completion of the annotation of the human proteome revealed that
out of the 20'400 protein-coding genes that exist in the human genome, about
5'000 of them lack experimentally validated functional information. To date,
many computational methods have been developed to predict gene function [10].
These methods do not exclusively rely on sequence homology and domain com-
position but also integrate contextual information, which is beyond the protein
sequence itself. Within this category, algorithms for protein function prediction
have been developed based on genomic context [1], protein-protein interaction
networks [17], and phenotypic profiling [11].
In this work we propose to tackle the problem of human protein function
prediction by three distinct supervised learning schemes : one-versus-all clas-
sification; tournament learning; multi-label learning (MLL). The target classes
 
Search WWH ::




Custom Search