Biomedical Engineering Reference
In-Depth Information
9.2.4.3 Phylogenomics
Another group of approaches that uses phylogenetic relationships for functional inference
involves the reconstruction and in-depth analysis of evolutionary history, commonly referred
to as phylogenomics [67-69]. Resampled Inference of Orthologs [69] describes the use of
bootstrapped resampled phylogenetic trees to improve orthologue discovery, which can
reduce errors in functional inference. Statistical Inference of Function Through Evolution-
ary Relationships (SIFTER) [68] builds a phylogenetic tree from the homologues of a
query protein and annotates speciation and duplication events in the tree. Known functional
annotations within the tree are then propagated using a Bayesian approach to assign pos-
terior probabilities of functional annotations to each node. The source code for SIFTER,
implemented in Java, is available for download at http://sifter.berkeley.edu/.
9.2.5 Sequence-derived functional and chemical properties
Homology-based methods such as those described above work very well when annotated
homologues of the query sequence can be found. However, such approaches are severely
limited otherwise. In cases where no or few annotated homologues can be found, it may still
be possible to infer a protein's functions from its sequence. A protein's sequence contains
vital information that governs its structure and function. For example, a protein involved in
signal transduction is likely to have many phosphorylation sites, while a protein involved in
DNA binding is likely to be localized to the nucleus [70]. The presence of phosphorylation
sites and subcellular localization, as well as many other physical and chemical characteristics
of a protein, can be derived or predicted from protein sequences and exploited for func-
tional inference.
ProtFun [71] uses 17 sequence-derived protein features, including predicted post-
translational modifications (PTMs), protein sorting signals and secondary structure and
physical/chemical properties, calculated from the amino acid composition to characterize
each protein. These properties are then used as features to perform supervised learning
for function prediction using artificial neural networks. Models are built for each function
by learning from labeled examples (annotated proteins). Subsequently, given a protein
sequence, similar features can be derived and classified by each model to predict if the pro-
tein has the function represented by the model. This approach was shown to work reasonably
where homology-dependent approaches fail due to the absence of well-annotated homo-
logues. ProtFun is available as a web service at http://www.cbs.dtu.dk/services/ProtFun/.
A similar approach is taken in ProtSVM [72], which uses sequence-derived properties
to train SVMs that can assign a protein sequence to 47 enzyme families. ProtSVM has
since been updated to include a wide range of functional families, such as lipid transport
and immune-response proteins, and can be accessed at http://jing.cz3.nus.edu.sg/cgi-bin/
svmprot.cgi.
Lobley et al . [73] propose a model that extends upon ProtFun by introducing new fea-
tures that encode disordered regions predicted by the DISOPRED server [74]. Disordered
regions are regions in proteins that do not have a stable, well-defined tertiary structure in
their native states [75]. It was discovered that proteins annotated with different functions
exhibit distinguishable bias in the distribution of both the lengths and locations of disor-
dered regions [73]. An SVM classifier is built for each GO term using these features. Based
on this approach, an online function prediction server FFPred [70] is made available at
Search WWH ::




Custom Search