A Distributed Publish/Subscribe System for RDF Data - Data Management in Cloud, Grid and P2P Systems

Databases Reference

In-Depth Information

A Distributed Publish/Subscribe System

for RDF Data

Laurent Pellegrino, Fabrice Huet, Francoise Baude, and Amjad Alshabani

INRIA-I3S-CNRS, University of Nice-Sophia Antipolis

2004 Route des Lucioles, Sophia Antipolis, France

firstname.lastname@inria.fr

Abstract. The pub/sub communication style is a prevalent messaging

pattern for filtering information from distributed and large-scale network

(e.g., from the real-time web, sensor networks, etc.) thanks to the decou-

pling between publishers and subscribers. At the same time, persisting

the published information is a prerequisite for any further batch analyt-

ics on such big amount of data. As data can be heterogeneous, reliance

on format from the semantic web such as RDF is unavoidable. In this

paper we introduce two versions of a content-based pub/sub matching

algorithm for RDF described events, working on an adapted version of

the CAN structured P2P network designed to both store and dissemi-

nate RDF events. In contrary to existing pub/sub solutions based upon

structured overlay networks that index semantic events several times due

to the use of hash functions, we leverage the lexicographic order of the

event elements. Thus, only subscriptions and not publications have to be

duplicated, which is better given that in real settings, publications may

occur more frequently than subscriptions. Furthermore, our system al-

lows to publish events made of any number of elements and the subscrip-

tion language leverages the SPARQL query language. The first algorithm

we introduce initially derives from the ideas discussed by Liarou. et al.

based upon rewriting continuous queries along matching RDF elements

(CSBV) with the purpose to perform the matching between subscriptions

and several RDF elements on multiple nodes. The experimental results

discuss the applicability of the presented algorithms to some synthetic

scenarios and identify, accordingly, which pub/sub matching algorithm

isthemorerelevant.

1 Introduction

The advent of the Semantic Web by the precursor Tim Bernes-Lee incites avail-

able information on the World Wide Web to become more and more structured.

Structured contents are possible thanks to powerful data models such as Re-

source Description Model (RDF) that makes knowledges machine-processable

and machine-understandable. Many centralized solutions such as Jena [4], Sesame

[1] or OWLIM [11] have been proposed the last years to store and retrieve RDF

data. However, they all suffer from their inherent design that is not suitable to

scale with the perpetual increase of the resources available on the Web. Some

Search WWH ::

Custom Search

Home