Prediction of Protein Function - Genomics: Essential Methods

Biomedical Engineering Reference

In-Depth Information

http://bioinf.cs.ucl.ac.uk/ffpred/ for the automated prediction of over 300 GO annotations

for an input protein sequence.

9.2.6 Protein-protein interaction maps

Protein sequences may encode useful information on the characteristics of the protein but

offer little clue on their interaction behaviour. Proteins do not work alone, but interact with

DNA, RNA and other proteins in complexes and pathways. Hence, an important source of

evidence that can suggest the type of biological processes in which a protein contributes

towards is a protein interaction map.

Protein-protein interaction data can be obtained from many databases. The Molecular

Interaction database (MINT) [76] at http://mint.bio.uniroma2.it/mint/ contains over 100 000

physical interactions curated from peer-reviewed journals. BioGRID [44] at http://www

.thebiogrid.org/ is one of the largest protein-protein interaction databases, with over 200 000

physical and genetic interactions curated from the literature, and has curated the complete

set of interactions available in the literature for S. cerevisiae and Schizosaccharomyces

pombe . Other databases from which protein-protein interaction data can be obtained include

the Database of Interacting Proteins (DIP) at http://dip.doe-mbi.ucla.edu/ and the Human

Protein Reference Database (HPRD) at http://www.hprd.org/. Protocol 9.3 provides details

on obtaining protein-protein interaction data from BioGRID.

Although protein-protein interactions are binary relationships (i.e. interact or not),

protein-protein interactions in databases vary in reliability. Protein-protein interactions can

be observed from many types of experiment, such as two hybrid [77], immuno-precipitation

and tandem affinity purification. Two-hybrid experiments are seemingly more susceptible to

noise and have been reported to suffer from high false positive rates [13, 78]. Co-purification

analysis, on the other hand, tends to be more reliable. Some computational methods on how

to reduce noise in protein-protein interaction data are discussed in Chua and Wong [79].

9.2.6.1 Interacting partners

The simplest but yet effective method to infer the function of a protein using protein-protein

interactions is to compute the frequency of each function among its interaction partners [80],

and is widely referred to as neighbor counting. The function with the highest frequency is

assigned to the protein.

Functional terms that are annotated to a larger number of proteins tend to appear in

larger numbers in the neighborhood of a protein. Hence, the neighbor counting method

may preferentially assign proteins with functions that have significantly high background

frequencies. Hishigaki et al . [81] identified this problem and proposed using the chi-square

statistical measure as a scoring function instead. For each function annotated to the neighbors

of a protein, the deviation of its observed occurrence from its expected occurrence is used

as a score to assign that function to the protein. Given protein u , function x is assigned

to u with

e x (u)) 2

e x (u)

(f x (u)

−

=

S x (u)

where f x ( u ) is the observed frequency of x in the neighbors of u and e x ( u ) is the expected

frequency of x in the neighbors of u based on its background frequency.

Genomics: Essential Methods

Search WWH ::

Custom Search

Home