Prediction of Protein Function - Genomics: Essential Methods

Biomedical Engineering Reference

In-Depth Information

9.2.4.3 Phylogenomics

Another group of approaches that uses phylogenetic relationships for functional inference

involves the reconstruction and in-depth analysis of evolutionary history, commonly referred

to as phylogenomics [67-69]. Resampled Inference of Orthologs [69] describes the use of

bootstrapped resampled phylogenetic trees to improve orthologue discovery, which can

reduce errors in functional inference. Statistical Inference of Function Through Evolution-

ary Relationships (SIFTER) [68] builds a phylogenetic tree from the homologues of a

query protein and annotates speciation and duplication events in the tree. Known functional

annotations within the tree are then propagated using a Bayesian approach to assign pos-

terior probabilities of functional annotations to each node. The source code for SIFTER,

implemented in Java, is available for download at http://sifter.berkeley.edu/.

9.2.5 Sequence-derived functional and chemical properties

Homology-based methods such as those described above work very well when annotated

homologues of the query sequence can be found. However, such approaches are severely

limited otherwise. In cases where no or few annotated homologues can be found, it may still

be possible to infer a protein's functions from its sequence. A protein's sequence contains

vital information that governs its structure and function. For example, a protein involved in

signal transduction is likely to have many phosphorylation sites, while a protein involved in

DNA binding is likely to be localized to the nucleus [70]. The presence of phosphorylation

sites and subcellular localization, as well as many other physical and chemical characteristics

of a protein, can be derived or predicted from protein sequences and exploited for func-

tional inference.

ProtFun [71] uses 17 sequence-derived protein features, including predicted post-

translational modifications (PTMs), protein sorting signals and secondary structure and

physical/chemical properties, calculated from the amino acid composition to characterize

each protein. These properties are then used as features to perform supervised learning

for function prediction using artificial neural networks. Models are built for each function

by learning from labeled examples (annotated proteins). Subsequently, given a protein

sequence, similar features can be derived and classified by each model to predict if the pro-

tein has the function represented by the model. This approach was shown to work reasonably

where homology-dependent approaches fail due to the absence of well-annotated homo-

logues. ProtFun is available as a web service at http://www.cbs.dtu.dk/services/ProtFun/.

A similar approach is taken in ProtSVM [72], which uses sequence-derived properties

to train SVMs that can assign a protein sequence to 47 enzyme families. ProtSVM has

since been updated to include a wide range of functional families, such as lipid transport

and immune-response proteins, and can be accessed at http://jing.cz3.nus.edu.sg/cgi-bin/

svmprot.cgi.

Lobley et al . [73] propose a model that extends upon ProtFun by introducing new fea-

tures that encode disordered regions predicted by the DISOPRED server [74]. Disordered

regions are regions in proteins that do not have a stable, well-defined tertiary structure in

their native states [75]. It was discovered that proteins annotated with different functions

exhibit distinguishable bias in the distribution of both the lengths and locations of disor-

dered regions [73]. An SVM classifier is built for each GO term using these features. Based

on this approach, an online function prediction server FFPred [70] is made available at

Genomics: Essential Methods

Search WWH ::

Custom Search

Home