Databases Reference
In-Depth Information
management under churn 1 they also destroy the ordering of the index by using
hashing, and along with it the grouping of semantically-related data, e.g. data
of a university domain cannot be stored on a contiguous interval and is spread
over the complete table. This can cause more routing when collecting data from
the same domain to evaluate a query.
GridVine [7] and our proposed distributed RDF system 3rdf [8] address this
by using the P2P networks P-Grid [9] and 3nuts [10] respectively, providing a
distributed search tree for order-preserving indexing. Domain-related prefixes
(namespaces) in subjects, predicates, and objects of RDF triples order triples of
the same domain in the same branches of the search tree. In return, the data
belonging to the same domain is stored on nearby peers (in the metric of the
overlay routing structure) or even at the same peer.
In this paper, we evaluate the performance of distributed RDF systems when
using either the Chord or 3nuts P2P-network and when using two different RDF
data distributions in the network, the state-of-the-art indexing for subject, pred-
icate, and object and a novel indexing introduced in this paper. In Section 2 we
provide an overview about distributed RDF systems and distributed query eval-
uation techniques relating to this work. In Section 3 we present our simulation
including overlay networks, RDF data and query model, data distribution in-
cluding our new indexing scheme for a fairer data distribution, and the query
processing including speed-ups by exploiting additional features of the 3nuts
network. Based on this simulator, the simulation results regarding the perfor-
mance of the RDF system with the performance metrics routing-steps and time
for RDF query evaluation are presented in Section 4. Finally, we conclude in
Section 5, and give a brief outlook on future work.
2 Related Work
With more and more Web resources annotated with RDF information, distribu-
ted solutions for storage and querying of RDF data is a need. Several projects
have proposed peer-to-peer networks for the distributed evaluation of RDF data.
The majority of these projects, such as RDFPeers [3], Atlas [4,5], and Ba-
belPeers [6], use DHTs for the storage and querying of RDF data. The basic
idea here is to store each triple at three locations using the hash values of sub-
ject, predicate, and object. Triples with a specific subject, predicate, or object
are obtained during query evaluation by computing the hash value of that spe-
cific key again to resolve the peer providing these triples. To improve the query
load distribution, authors in [5] additionally index the triples by combinations
of triple components 'subject+predicate', 'subject+object', 'predicate+object',
and 'subject+predicate+object', with 7 replications of each triple in total. In
Section 3.3 we present a similar technique but with another objective, which of-
fers a more balanced data distribution. In the work [3] they measure the amount
of highly frequent triple components. The authors of [3,6] mention that triples
1 Peers entering/leaving the network only invoke local changes and take over/shed
data to neighbors.
Search WWH ::




Custom Search