Effects of Network Structure Improvement on Distributed RDF Querying - Data Management in Cloud, Grid and P2P Systems

Databases Reference

In-Depth Information

management under churn 1 they also destroy the ordering of the index by using

hashing, and along with it the grouping of semantically-related data, e.g. data

of a university domain cannot be stored on a contiguous interval and is spread

over the complete table. This can cause more routing when collecting data from

the same domain to evaluate a query.

GridVine [7] and our proposed distributed RDF system 3rdf [8] address this

by using the P2P networks P-Grid [9] and 3nuts [10] respectively, providing a

distributed search tree for order-preserving indexing. Domain-related prefixes

(namespaces) in subjects, predicates, and objects of RDF triples order triples of

the same domain in the same branches of the search tree. In return, the data

belonging to the same domain is stored on nearby peers (in the metric of the

overlay routing structure) or even at the same peer.

In this paper, we evaluate the performance of distributed RDF systems when

using either the Chord or 3nuts P2P-network and when using two different RDF

data distributions in the network, the state-of-the-art indexing for subject, pred-

icate, and object and a novel indexing introduced in this paper. In Section 2 we

provide an overview about distributed RDF systems and distributed query eval-

uation techniques relating to this work. In Section 3 we present our simulation

including overlay networks, RDF data and query model, data distribution in-

cluding our new indexing scheme for a fairer data distribution, and the query

processing including speed-ups by exploiting additional features of the 3nuts

network. Based on this simulator, the simulation results regarding the perfor-

mance of the RDF system with the performance metrics routing-steps and time

for RDF query evaluation are presented in Section 4. Finally, we conclude in

Section 5, and give a brief outlook on future work.

2 Related Work

With more and more Web resources annotated with RDF information, distribu-

ted solutions for storage and querying of RDF data is a need. Several projects

have proposed peer-to-peer networks for the distributed evaluation of RDF data.

The majority of these projects, such as RDFPeers [3], Atlas [4,5], and Ba-

belPeers [6], use DHTs for the storage and querying of RDF data. The basic

idea here is to store each triple at three locations using the hash values of sub-

ject, predicate, and object. Triples with a specific subject, predicate, or object

are obtained during query evaluation by computing the hash value of that spe-

cific key again to resolve the peer providing these triples. To improve the query

load distribution, authors in [5] additionally index the triples by combinations

of triple components 'subject+predicate', 'subject+object', 'predicate+object',

and 'subject+predicate+object', with 7 replications of each triple in total. In

Section 3.3 we present a similar technique but with another objective, which of-

fers a more balanced data distribution. In the work [3] they measure the amount

of highly frequent triple components. The authors of [3,6] mention that triples

1 Peers entering/leaving the network only invoke local changes and take over/shed

data to neighbors.

Data Management in Cloud, Grid and P2P Systems

Search WWH ::

Custom Search

Home