Biomedical Engineering Reference
In-Depth Information
chosen because they utilize data that are widely available and usually produce reasonably
consistent and relatively good predictions.
Other approaches also exist that use different sources of data, including protein structure
[16-18], genomic context [19], gene expression [20, 21], text mining [22, 23] and integration
of multiple data sources [24-28]. A comprehensive review on available PFP methods can be
found in Hawkins and Kihara [29], while Sharan et al . [30] provide an excellent review on
the technical aspects of approaches that uses protein-protein interaction networks for PFP.
A recent publication also describes a large-scale comparison between many well established
gene function prediction methods on Mus musculus genes [31].
9.2.1 Annotation schemes
Automated functional prediction is only plausible with the availability of a systematic way
of assigning function annotation [32]. Several systems of gene/protein function annotation
schemes have been used. One of the earliest standardized schemes is the EC nomenclature
[33] developed by the Enzyme Commission of the International Union of Biochemistry and
Molecular Biology in the 1950s for classifying enzymes based on their chemical properties.
Structural Classification of Proteins (SCOPs) [34] was developed in 1995 to classify proteins
based on structure and phylogenetic relationship. The first generalized scheme for classifying
protein function was introduced in 1993 for classifying Escherichia coli proteins [35]. These
classification schemes annotate either a subset of proteins, specific genomes or particular
aspects of proteins.
9.2.1.1 FunCat
A more comprehensive functional categorization scheme is the Functional Catalogue (Fun-
Cat) [32], developed by the Munich Information Center for Protein Sequences (MIPS) [36].
The FunCat comprises a number of main functional categories (28 in version 2.1, which
is the most current at the time of writing) which describe a wide range of general gene
functions. Each category consists of a number of gene functional descriptions arranged in a
hierarchical structure referred to as a tree in computer science terminology. The annotation
term at the top of each hierarchy (or the root of the tree) is the most general description of
the category, with children terms describing different and more specific forms of their parent
term. The trees can span up to six levels in depth. The scheme was originally used for the
annotation of the Saccharomyces cerevisiae genes, but is generic enough to be extended to
other species. A subset of the FunCat annotation scheme is presented in Figure 9.1.
The FunCat scheme can be downloaded from ftp://ftpmips.gsf.de/catalogue/ in both
plain text and XML formats. FunCat annotations for a handful of genomes, including
S. cerevisiae , Fusarium graminearum and Arabidopsis thaliana , can be downloaded from
ftp://ftpmips.gsf.de/catalogue/annotation_data/.
9.2.1.2 Gene ontology
The FunCat annotation scheme is not widely adopted in other databases other than those
maintained by MIPS. A more extensively utilized annotation scheme for gene and protein
functions is Gene Ontology (GO) [37]. GO was initiated as a collaborative effort in 1998 to
Search WWH ::




Custom Search