Database Reference
In-Depth Information
method), 53 computes this reliability by estimating the likelihood that the two
proteins also have paralogs that are known to interact.
In many cases, the structure of the protein-protein interaction network is
itself able to provide useful information about the likelihood of an interaction
between two proteins. Such information may be in the form of the inter-
connection patterns of these proteins with other proteins in the network. An
important structural feature of networks that provides a more robust estimate
of the reliability is the concept of common neighborhood , that is, the set of pro-
teins that are interaction partners of both of the proteins whose interaction is
being evaluated. Intuitively, the larger the number of common neighbors two
proteins have, the more likely they are to have a direct interaction between
them. Some approaches have used measures for evaluating the reliability of
a given interaction in terms of this common neighborhood-based similarity
measure, such as Jaccard similarity or a variation thereof. 54 - 56 A more system-
atic approach has been proposed recently, 57 where the h - confidence measure 58
from the field of association analysis 47 in data mining was used to quantify this
likelihood. For interaction networks, the h - confidence measure may be defined
as shown in Equation (8.1). Here, N P 1
and N P 2
denote the sets of neighbors
of P 1 and P 2 respectively.
min |
N P 1
N P 2 |
, |
N P 1
N P 2 |
h
confidence
(
P 1 ,
P 2 ) =
(8.1)
|
N P 1 |
|
N P 2 |
As defined above, h - confidence is only applicable to binary data or, in
the context of protein interaction graphs, to unweighted graphs. However,
the notion of h - confidence can be readily generalized to networks where the
edges carry real-valued weights indicating their reliability. In this case, Equa-
tion (8.1) can be conveniently modified to calculate h - con f idence
(
P 1 ,
P 2 )
by
making the following substitutions: (1)
|
N P 1 |→
sum of weights of edges in-
cident on P 1 (similarly for P 2 ) and (2)
sum of minimum of
weights of each pair of edges that are incident on a protein P from both P 1
and P 2 . In both of these cases, the h - confidence measure is guaranteed to fall
in the range [0
|
N P 1
N P 2 |→
1].
Now, with the hypothesis that a function of the number of common neigh-
bors indicates the reliability of an interaction between two proteins, two types
of interactions, namely spurious and potential interactions, can be character-
ized in an interaction network as follows:
,
Interactions in the network that have a low h - confidence score are likely
to be spurious .
A pair of proteins that have a high pairwise h - confidence score are likely
to interact, even if the network currently does not contain an interaction
between them, and these are termed as potential interactions.
Available online at http://dip.doe-mbi.ucla.edu/dip/Services.cgi?SM=2. Accessed July 12, 2008.
Search WWH ::




Custom Search