Databases Reference
In-Depth Information
are not distributed uniformly in the network because of non-uniform distributed
index-keys. In this paper we will provide an analysis of the data distribution.
Traditional DHT-based P2P networks, such as Chord [11], are used as un-
derlying overlay networks in the aforementioned distributed RDF data stores
applying uniform hash functions to map data keys to peers in the network.
This achieves good storage load balancing but sacrifices the preservation of the
semantic proximity of RDF triples because it destroys existing relations among
the inserted triples keys (attributes) based on their order. RDF triple keys which
are semantically close at the application level are heavily fragmented in DHTs,
and hence the eciency of RDF range queries or queries posed on semantically-
related attributes is significantly spoiled in these networks.
GridVine [7] and 3rdf [8] are other distributed RDF systems proposed for the
storage and querying of RDF data. GridVine and 3rdf use the P-Grid [9] and
3nuts [10] P2P networks respectively, to provide an order-preserving search tree
instead of a DHT-based search structure. The ordering in the tree can represent
the semantical proximity of closely related RDF triples (e.g. triples predicates
with the same prefix will be organized in the same subtree). In contrast to P-
Grid, 3nuts provides the additional feature of so-called interest locality ,where
peers with a special interest in a particular search key or path can voluntary
participate in managing these paths. When co-managing a path and establishing
routing there, a peer increases its routing table but retains fast routing in that
path with direct links to other peers in the branch of the path. This is the reason
why we say the peer has a shortcut to that path. Additionally, the peer may also
participate in voluntary managing data in a path (which we do not make use of
in this work).
3 Simulation
3.1 Network Models
We simulate a distributed RDF system using either the DHT-based overlay
Chord or the search-tree based overlay 3nuts and compare the performance be-
tween both. The results might be transferable to the complete class of both
DHT-based and search-tree based peer-to-peer networks. The basic difference
between both will be explained below.
DHT-Based Overlay Networks. The majority of the state-of-the-art dis-
tributed RDF systems still use DHTs [12] for data allocation in the distributed
system. In a DHT, each peer and data item has an identifier, e.g. network address
and file name, which are hashed to a hash key in key space [0 , 2 m ) for a typical
constant m = 128 for 128-bit keys. A peer then gets all data assigned which
has a hash key between its hash key and the next larger hash key of another
peer in the key space ring. It can be shown that the key space range assigned
to any peer is not greater than factor
(log n ) as the expected key space range
which is 2 m /n for n peers in the network. This results in a fair load balancing of
O
 
Search WWH ::




Custom Search