Biology Reference
In-Depth Information
of Interacting Proteins (DIP), 43 the Biomolecular
Interaction Network Database (BIND), 44 the
Molecular Interaction Database (MINT), 45
the Protein Interaction Database (IntAct), 46 the
Munich Information Center for Protein Sequence
(MIPS), 47 the Biological General Repository for
InteractionDatasets (BIOGRID), 48 and theHuman
Protein Interaction Database (HPRD). 49 Some
of these databases are dedicated to human
protein e protein interactions and others report
interactions fromother species and are not limited
to physical interactions. 50 Although they all
harbor protein e protein interaction data, there is
asigni
that the focus is shifted from a single target to
a systemwide approach for identifying predic-
tors to classify complex biological systems.
However, some studies suggest that many of
the tools used for these studies are still not
mature enough to allow for reliable and robust
analysis. The recent work performed by Staiger
et al. 38 highlights some of these shortcomings.
They reevaluated three studies by Chuang
et al., 39 Lee et al., 40 and Taylor et al. 41 using their
own methodology and concluded that
combining expression and protein e protein
interaction data does not improve predictability
over single gene classi
ers. There could be
several reasons for this discrepancy, such as
a difference in the performance of the algorithm
used and/or the low quality of the network and
pathway data. It is known that a signi
cant difference in their scope, how the
data is represented, and the purpose behind the
creation of the database.
The vast majority of the protein interaction
data reported comes from two sources: direct
deposit of medium and high-throughput
discovery experiments using either the yeast
2-hybrid system or af
cant
portion of the reported interactions in these data-
bases are not reliable, which will cause problems
when used in conjunction with other data for
classi
cation followed
by MS and manual and/or computer curation of
the literature. It has been generally accepted that
the reliability of the curated data is signi
nity puri
cation and biomarker discovery.
cantly
better than the high throughput data, as it often
involves unbiased annotation by an experienced
scientist rather than the depositing of unvali-
dated high-throughput data. Interestingly, the
clear superiority of curated data has recently
been challenged. A study by Cusick et al. 51
demonstrated that manual curation can be error
prone. They suggest that one of the primary
reasons is that it is dif
PROTEINePROTEIN
I NTERACTION DATABASE S
Although the application of protein interaction
data to aid in the identi
cation of biomarkers is
increasing ( Table 1 ), there are still signi
cant
problems associated with the available protein e
protein interaction data. These problems are
mainly due to the high number of false positive
interactions reported, errant curation, and the
incompleteness of the mapping of the human
interactome. It has been estimated that only
around 33,000 out of 130,000 binary interactions
have been mapped. 42 This estimate is probably
conservative because it does not account for
differential interactions between alternative splice
variants of the same protein or differential
complex formation under diverse physiological
conditions.
There are several publicly available protein e
protein interaction databases such as the Database
cult to extract information
from long complicated texts; in particular, the
gene names and inconsistent scoring creates
signi
cant short-
coming of protein e protein interaction databases
is highlighted in a study by Turinsky et al. 52
They compared the concurrence between nine
major public databases: BIND, BioGrid,
CORUM, DIP, HPRD, IntAct, MINT, MPact,
and MPPI. Based on a total of 15,471 shared
publications, which represented approximately
36% of all the cited publications, the average
agreement between any two databases is only
cant problems. Another signi
Search WWH ::




Custom Search