Databases Reference
In-Depth Information
Effects of Network Structure Improvement
on Distributed RDF Querying
Liaquat Ali, Thomas Janson, Georg Lausen, and Christian Schindelhauer
University of Freiburg
{ ali,janson,lausen,schindel } @informatik.uni-freiburg.de
http://www.informatik.uni-freiburg.de
Abstract. In this paper, we analyze the performance of distributed
RDF systems in a peer-to-peer (P2P) environment. We compare the per-
formance of P2P networks based on Distributed Hash Tables (DHTs) and
search-tree based networks. Our simulations show a performance boost
of factor 2 when using search-tree based networks. This is achieved by
grouping related data in branches of the tree, which tend to be accessed
combined in a query, e.g. data of a university domain is in one branch.
We observe a strongly unbalanced data distribution when indexing the
RDF triples by subject, predicate, and object, which raises the question
of scalability for huge data sets, e.g. peer responsible for predicate 'type'
is overloaded. However, we show how to exploit this unbalanced data
distribution, and how we can speed up the evaluation of queries dra-
matically with only a few additional routing links, so-called shortcuts ,to
these frequently occurring triples components. These routing shortcuts
can be established with only a constant increase of the peer's routing ta-
bles. To cope with hotspots of unfair load balancing, we propose a novel
indexing scheme where triples are indexed 'six instead of three times'
with only 23% data overhead in experiments and the possibility of more
parallelism in query processing. For experiments, we use the LUBM data
set and benchmark queries.
1 Introduction
A goal of the Semantic Web [1] initiative is to integrate data from web resources
into machine-driven evaluation. The Resource Description Framework (RDF [2])
data model has been proposed by the W3C to encode these data. To cope with
the anticipated load of the Semantic Web data, several projects have emerged
that have studied distributed solutions for the storage and querying of RDF data.
State-of-the-art distributed RDF data stores such as RDFPeers [3], Atlas [4,5],
and BabelPeers [6] use Distributed Hash Tables (DHTs) to store and query RDF
data in a distributed manner. To attain an ecient search for RDF triples with
the same subject, predicate, or object, the triples are indexed three times for
each triple component (subject, predicate, or object) in these distributed RDF
databases. DHTs while provide fair load balancing properties with easy data
 
Search WWH ::




Custom Search