A Distributed Publish/Subscribe System for RDF Data - Data Management in Cloud, Grid and P2P Systems

Databases Reference

In-Depth Information

discussion is given about how duplicates are avoided when in-memory buffers

overflow. Moreover, subscriptions are formulated through an ad-hoc scripting

language and no experimental evaluation is available.

3E entC oudD gn

In this section we give a description of the data and subscription model used by

our system, dubbed EventCloud. We explain how events and more specifically

how RDF data, along with subscriptions, are indexed in a CAN network.

3.1 Data and Subscription Model

Our data and subscription model follows the approach taken in [15] to allow users

to formulate queries and subscriptions but also to insert and publish information

with the same models, that is respectively RDF and SPARQL.

Events. The data are expressed in the RDF model using 4-tuples (quadruples)

whose elements are named RDF terms. In our system an RDF term may be

either an IRI or a Literal value. Elements generated at the same time by a given

source form a Compound Event ( CE ), as defined by (1b). Each CE is made

of a list of quadruples and all quadruples share a common term called graph

value. This term is built with a combination of a unique source identifier and

a timestamp. The purpose of this graph value is twofold. It is used to identify

the event source, the event itself and also to offer the possibility to link together

several quadruples for emulating, yet unbounded, multi-attribute values like in

traditional pub/sub systems.

q =( g,s,p,o )

|

g,s,p,o

∈

RDFTerm

(1a)

CE =( q 1 , ..., q i , ..., q n )

|

q i =( g,s i ,p i ,o i )

(1b)

The EventCloud is based on a four dimensional CAN overlay that uses the lexi-

cographic order for routing requests. The four dimensions of the CAN coordinate

space are mapped respectively to the graph, the subject, the predicate and the

object of any RDF 4-tuple that is indexed. One benefit of this approach is that

a quadruple represents a point in the four dimensional Cartesian space. Hence

a quadruple will only be stored by a single peer of the overlay. This indexing

approach has several advantages. First, it supports range queries (looking for

values in a specified range) eciently. Second, the lexicographic order preserves

the data semantics so that is gives a form of clustering of quadruples sharing a

common prefix. In contrast, hash-based approaches destroy the natural ordering

of information and make the management of complex queries dicult and ex-

pensive. The Figure 1 shows how CEs and subscriptions are mapped to a CAN

network.

Data Management in Cloud, Grid and P2P Systems

Search WWH ::

Custom Search

Home