Effects of Network Structure Improvement on Distributed RDF Querying - Data Management in Cloud, Grid and P2P Systems

Databases Reference

In-Depth Information

are not distributed uniformly in the network because of non-uniform distributed

index-keys. In this paper we will provide an analysis of the data distribution.

Traditional DHT-based P2P networks, such as Chord [11], are used as un-

derlying overlay networks in the aforementioned distributed RDF data stores

applying uniform hash functions to map data keys to peers in the network.

This achieves good storage load balancing but sacrifices the preservation of the

semantic proximity of RDF triples because it destroys existing relations among

the inserted triples keys (attributes) based on their order. RDF triple keys which

are semantically close at the application level are heavily fragmented in DHTs,

and hence the eciency of RDF range queries or queries posed on semantically-

related attributes is significantly spoiled in these networks.

GridVine [7] and 3rdf [8] are other distributed RDF systems proposed for the

storage and querying of RDF data. GridVine and 3rdf use the P-Grid [9] and

3nuts [10] P2P networks respectively, to provide an order-preserving search tree

instead of a DHT-based search structure. The ordering in the tree can represent

the semantical proximity of closely related RDF triples (e.g. triples predicates

with the same prefix will be organized in the same subtree). In contrast to P-

Grid, 3nuts provides the additional feature of so-called interest locality ,where

peers with a special interest in a particular search key or path can voluntary

participate in managing these paths. When co-managing a path and establishing

routing there, a peer increases its routing table but retains fast routing in that

path with direct links to other peers in the branch of the path. This is the reason

why we say the peer has a shortcut to that path. Additionally, the peer may also

participate in voluntary managing data in a path (which we do not make use of

in this work).

3 Simulation

3.1 Network Models

We simulate a distributed RDF system using either the DHT-based overlay

Chord or the search-tree based overlay 3nuts and compare the performance be-

tween both. The results might be transferable to the complete class of both

DHT-based and search-tree based peer-to-peer networks. The basic difference

between both will be explained below.

DHT-Based Overlay Networks. The majority of the state-of-the-art dis-

tributed RDF systems still use DHTs [12] for data allocation in the distributed

system. In a DHT, each peer and data item has an identifier, e.g. network address

and file name, which are hashed to a hash key in key space [0 , 2 m ) for a typical

constant m = 128 for 128-bit keys. A peer then gets all data assigned which

has a hash key between its hash key and the next larger hash key of another

peer in the key space ring. It can be shown that the key space range assigned

to any peer is not greater than factor

(log n ) as the expected key space range

which is 2 m /n for n peers in the network. This results in a fair load balancing of

O

Data Management in Cloud, Grid and P2P Systems

Search WWH ::

Custom Search

Home