Scientific Data Analysis - Scientific Data Management

Database Reference

In-Depth Information

method), 53 † computes this reliability by estimating the likelihood that the two

proteins also have paralogs that are known to interact.

In many cases, the structure of the protein-protein interaction network is

itself able to provide useful information about the likelihood of an interaction

between two proteins. Such information may be in the form of the inter-

connection patterns of these proteins with other proteins in the network. An

important structural feature of networks that provides a more robust estimate

of the reliability is the concept of common neighborhood , that is, the set of pro-

teins that are interaction partners of both of the proteins whose interaction is

being evaluated. Intuitively, the larger the number of common neighbors two

proteins have, the more likely they are to have a direct interaction between

them. Some approaches have used measures for evaluating the reliability of

a given interaction in terms of this common neighborhood-based similarity

measure, such as Jaccard similarity or a variation thereof. 54 - 56 A more system-

atic approach has been proposed recently, 57 where the h - confidence measure 58

from the field of association analysis 47 in data mining was used to quantify this

likelihood. For interaction networks, the h - confidence measure may be defined

as shown in Equation (8.1). Here, N P 1

and N P 2

denote the sets of neighbors

of P 1 and P 2 respectively.

min |

N P 1 ∩

N P 2 |

, |

N P 1 ∩

N P 2 |

−

confidence

(

P 1 ,

P 2 ) =

(8.1)

N P 1 |

N P 2 |

As defined above, h - confidence is only applicable to binary data or, in

the context of protein interaction graphs, to unweighted graphs. However,

the notion of h - confidence can be readily generalized to networks where the

edges carry real-valued weights indicating their reliability. In this case, Equa-

tion (8.1) can be conveniently modified to calculate h - con f idence

(

P 1 ,

P 2 )

making the following substitutions: (1)

N P 1 |→

sum of weights of edges in-

cident on P 1 (similarly for P 2 ) and (2)

sum of minimum of

weights of each pair of edges that are incident on a protein P from both P 1

and P 2 . In both of these cases, the h - confidence measure is guaranteed to fall

in the range [0

N P 1 ∩

N P 2 |→

1].

Now, with the hypothesis that a function of the number of common neigh-

bors indicates the reliability of an interaction between two proteins, two types

of interactions, namely spurious and potential interactions, can be character-

ized in an interaction network as follows:

Interactions in the network that have a low h - confidence score are likely

to be spurious .

A pair of proteins that have a high pairwise h - confidence score are likely

to interact, even if the network currently does not contain an interaction

between them, and these are termed as potential interactions.

† Available online at http://dip.doe-mbi.ucla.edu/dip/Services.cgi?SM=2. Accessed July 12, 2008.

Scientific Data Management

Search WWH ::

Custom Search

Home