Effects of Network Structure Improvement on Distributed RDF Querying - Data Management in Cloud, Grid and P2P Systems

Databases Reference

In-Depth Information

Effects of Network Structure Improvement

on Distributed RDF Querying

Liaquat Ali, Thomas Janson, Georg Lausen, and Christian Schindelhauer

University of Freiburg

{ ali,janson,lausen,schindel } @informatik.uni-freiburg.de

Abstract. In this paper, we analyze the performance of distributed

RDF systems in a peer-to-peer (P2P) environment. We compare the per-

formance of P2P networks based on Distributed Hash Tables (DHTs) and

search-tree based networks. Our simulations show a performance boost

of factor 2 when using search-tree based networks. This is achieved by

grouping related data in branches of the tree, which tend to be accessed

combined in a query, e.g. data of a university domain is in one branch.

We observe a strongly unbalanced data distribution when indexing the

RDF triples by subject, predicate, and object, which raises the question

of scalability for huge data sets, e.g. peer responsible for predicate 'type'

is overloaded. However, we show how to exploit this unbalanced data

distribution, and how we can speed up the evaluation of queries dra-

matically with only a few additional routing links, so-called shortcuts ,to

these frequently occurring triples components. These routing shortcuts

can be established with only a constant increase of the peer's routing ta-

bles. To cope with hotspots of unfair load balancing, we propose a novel

indexing scheme where triples are indexed 'six instead of three times'

with only 23% data overhead in experiments and the possibility of more

parallelism in query processing. For experiments, we use the LUBM data

set and benchmark queries.

1 Introduction

A goal of the Semantic Web [1] initiative is to integrate data from web resources

into machine-driven evaluation. The Resource Description Framework (RDF [2])

data model has been proposed by the W3C to encode these data. To cope with

the anticipated load of the Semantic Web data, several projects have emerged

that have studied distributed solutions for the storage and querying of RDF data.

State-of-the-art distributed RDF data stores such as RDFPeers [3], Atlas [4,5],

and BabelPeers [6] use Distributed Hash Tables (DHTs) to store and query RDF

data in a distributed manner. To attain an ecient search for RDF triples with

the same subject, predicate, or object, the triples are indexed three times for

each triple component (subject, predicate, or object) in these distributed RDF

databases. DHTs while provide fair load balancing properties with easy data

Search WWH ::

Custom Search

Home