Scientific Data Analysis - Scientific Data Management

Database Reference

In-Depth Information

Using these definitions, Pandey et al. 57 proposed the following graph trans-

formation approach for purifying available interaction datasets. Consider the

input interaction network G

, where V is the set of nodes represent-

ing the proteins in the network, and E is the set of edges representing the

protein-protein interactions constituting the network. First, the h - confidence

measure is computed between each pair of constituent proteins, whether con-

nected or unconnected by an edge in the input network. Next, a threshold

is applied to drop the protein pairs with a low h - confidence to remove spuri-

ous interactions and control the density of the network. The resultant graph

G = (

= (

V

,

E

)

E )

is hypothesized to be the less noisy and more complete version of

G , since it is expected to contain fewer noisy edges, some biologically viable

edges that were not present in the original graph, and more accurate weights

on the remaining edges.

In order to evaluate the ecacy of the resultant networks for protein func-

tion prediction, we provided the original and the transformed graphs as input

to the FunctionalFlow algorithm. 59 FunctionalFlow is a graph-theory-based

algorithm that enables insuciently connected proteins to obtain functional

annotations from distant proteins in the network and has produced much

better results than several other function prediction algorithms operating on

protein interaction networks. We also tested several transformed versions of

the input network generated using our graph transformation approach in con-

junction with some other common neighbor-based similarity measures, such

as the number of common neighbors, and Samanta et al.'s p-value measure. 55

Figure 8.4 shows the performance of the FunctionalFlow algorithm on these

transformed versions of two standard interaction networks, measured in terms

of the accuracy of the top scoring 1,000 predictions of the functions of the con-

stituent proteins.

The significant improvement in the accuracy of the predictions derived

from the h - confidence -based transformations of standard interaction networks,

one of which is constructed by combining several popular yeast interaction

datasets (combined) and weighted using the EPR index tool, and the other

being a confident subset of the DIP database 35 (DIPCore), shows that this

association analysis-based graph transformation approach is indeed able to

reduce noise, enhance completeness, and assign more reliable weights to the

constituent edges. The other similarity measures were also substantially out-

performed by h - confidence . This result is in coherence with those of an earlier

study, where h - confidence and hypercliques were used to eliminate noisy ob-

jects from datasets. 60

V

,

8.4.2 Future Directions

The above discussion shows that the preprocessing of biological data can en-

hance the performance of standard function prediction algorithms substan-

tially, and thus should be considered as an integral step of the process of

Scientific Data Management

Search WWH ::

Custom Search

Home