Biomedical Engineering Reference
In-Depth Information
http://bioinf.cs.ucl.ac.uk/ffpred/ for the automated prediction of over 300 GO annotations
for an input protein sequence.
9.2.6 Protein-protein interaction maps
Protein sequences may encode useful information on the characteristics of the protein but
offer little clue on their interaction behaviour. Proteins do not work alone, but interact with
DNA, RNA and other proteins in complexes and pathways. Hence, an important source of
evidence that can suggest the type of biological processes in which a protein contributes
towards is a protein interaction map.
Protein-protein interaction data can be obtained from many databases. The Molecular
Interaction database (MINT) [76] at http://mint.bio.uniroma2.it/mint/ contains over 100 000
physical interactions curated from peer-reviewed journals. BioGRID [44] at http://www
.thebiogrid.org/ is one of the largest protein-protein interaction databases, with over 200 000
physical and genetic interactions curated from the literature, and has curated the complete
set of interactions available in the literature for S. cerevisiae and Schizosaccharomyces
pombe . Other databases from which protein-protein interaction data can be obtained include
the Database of Interacting Proteins (DIP) at http://dip.doe-mbi.ucla.edu/ and the Human
Protein Reference Database (HPRD) at http://www.hprd.org/. Protocol 9.3 provides details
on obtaining protein-protein interaction data from BioGRID.
Although protein-protein interactions are binary relationships (i.e. interact or not),
protein-protein interactions in databases vary in reliability. Protein-protein interactions can
be observed from many types of experiment, such as two hybrid [77], immuno-precipitation
and tandem affinity purification. Two-hybrid experiments are seemingly more susceptible to
noise and have been reported to suffer from high false positive rates [13, 78]. Co-purification
analysis, on the other hand, tends to be more reliable. Some computational methods on how
to reduce noise in protein-protein interaction data are discussed in Chua and Wong [79].
9.2.6.1 Interacting partners
The simplest but yet effective method to infer the function of a protein using protein-protein
interactions is to compute the frequency of each function among its interaction partners [80],
and is widely referred to as neighbor counting. The function with the highest frequency is
assigned to the protein.
Functional terms that are annotated to a larger number of proteins tend to appear in
larger numbers in the neighborhood of a protein. Hence, the neighbor counting method
may preferentially assign proteins with functions that have significantly high background
frequencies. Hishigaki et al . [81] identified this problem and proposed using the chi-square
statistical measure as a scoring function instead. For each function annotated to the neighbors
of a protein, the deviation of its observed occurrence from its expected occurrence is used
as a score to assign that function to the protein. Given protein u , function x is assigned
to u with
e x (u)) 2
e x (u)
(f x (u)
=
S x (u)
where f x ( u ) is the observed frequency of x in the neighbors of u and e x ( u ) is the expected
frequency of x in the neighbors of u based on its background frequency.
Search WWH ::




Custom Search