Relational Techniques for Storing and Querying RDF Data - Advanced Database Query Systems

Database Reference

In-Depth Information

Table 2. SP 2 bench benchmark queries

Q1 Return the year of publication of “Journal 1 (1940)”

Q2 Extract all proceedings with properties: creatore, booktitle, issued, part of, seeAlso, title, pages, homepage, and optionally

abstract, including their values

Q3abc Select all articles with property (a) pages (b) month (c) isbn

Q4 Select all distinct pairs of article author names for authors that have published in the same journal

Q5 Return the names of all persons that occur as author o at least one proceeding and at least one article

Q6 Return, for each year, the set of all publications authored by persons that have not published in years before

Q7 Return the titles of all papers that have been cited at least once, but not by any paper that has not been cited itself

Q8 Compute authors that have published with Paul Erdos or with an author that has published with Paul Erdos

Q9 Return incoming and outgoing properties of persons

Q10 Return publications and venues in which “Paul Erdos” is involved either as author or as editor.

Q11 Return top 10 electronic edition URLs starting from the 51th publication, in lexicographical order.

Q12abc (a) Return yes if a person is an author of at least one proceeding and article. (b) return yes if an author has published with “Paul

Erdos” or with an author that has published with “Paul Erdos” (c) Return yes if person “John Q. Public” exists.

1. Triple Stores (TS) : where a single relational table is used to store the whole set of RDF triples

(subject, predicate, object). We follow the RDF-3X and build indexes over all 6 permutations of

the three fields of each RDF triple.

2. Binary Table Stores (BS) : for each unique predicate in the RDF data, we create a binary table

(ID, Value) and two indexes over the permutations of the two fields are built.

3. Traditional Relational Stores (RS) : In this scheme, we use the Entity Relationship Model of the

DBLP dataset and follow the traditional way of designing normalized relational schema where we

build a separate table for each entity (with its associated descriptive attributes) and use foreign keys

to represent the relationships between the different objects. We build specific partitioned B-tree

indexes Graefe (2003) for each table based on the referenced attributes in the benchmark queries.

4. Property Table Stores (PS) : where we use the schema of RS and decompose each entity with num-

ber of attributes ≥ 4 into two subject-property tables. The decomposition is done blindly and based

on the order of the attributes without considering the benchmark queries (workload independent).

Performance Metrics

We measure and compare the performance of the alternative relational RDF storage techniques using

the following metrics:

1. Loading Time : represents the period of time for shredding the RDF dataset into the relational

tables of the storage scheme.

2. Storage Cost : depicts the size of the storage disk space which is consumed by the relational stor-

age schemes for storing the RDF dataset.

3. Query Performance : represents the execution times for the different SQL-translation of the

SPARQL queries of SP 2 Bench over the alternative relational storage schemes.

Search WWH ::

Custom Search

Home