Relational Techniques for Storing and Querying RDF Data - Advanced Database Query Systems

Database Reference

In-Depth Information

around one RDF element and defines a prioritization between the other two elements. Two vectors are

associated with each RDF element (e.g., subject), one for each of the other two RDF elements (e.g.,

property and object). In addition, lists of the third RDF element are appended to the elements in these

vectors. In total, six distinct indices are used for indexing the RDF data. These indices materialize all

possible orders of precedence of the three RDF elements. A clear disadvantage of this approach is that

Hexastore features a worst-case fivefold storage increase in comparison to a conventional triples table.

PROPERTY TABLE STORES

Due to the proliferations of self-joins involved with the triple-store, the property table approach was

proposed. The main idea of this approach is to create separate n-ary tables (property tables) for subjects

that tend to have common properties together in a single table. Hence, designing the schema of the prop-

erty tables depends on the availability of either explicit or implicit information about the characteristics

of the objects in the RDF dataset.

Jena is a an open-source toolkit for Semantic Web programmers (McBride, 2002). It implements

persistence for RDF graphs using an SQL database through a JDBC connection. The schema of the

first version of Jena, Jena1, consisted of a statement table, a literals table and a resources table. The

statement table (Subject, Predicate, ObjectURI, ObjectLiteral) contained all statements and referenced

the resources and literals tables for subjects, predicates and objects. To distinguish literal objects from

resource URIs, two columns were used. The literals table contained all literal values and the resources

table contained all resource URIs in the graph. However, every query operation required multiple joins

between the statement table and the literals table or the resources table.

To address this problem, the Jena2 schema trades-off space for time. It uses a denormalized schema

in which resource URIs and simple literal values are stored directly in the statement table. In order to

distinguish database references from literals and URIs, column values are encoded with a prefix that

indicates which the kind of value. A separate literals table is only used to store literal values whose

length exceeds a threshold, such as blobs. Similarly, a separate resources table is used to store long URIs.

By storing values directly in the statement table it is possible to perform many queries without a join.

However, a denormalized schema uses more database space because the same value (literal or URI) is

stored repeatedly. The increase in database space consumption is addressed by using string compression

schemes. Both Jena1 and Jena2 permit multiple graphs to be stored in a single database instance. In

Jena1, all graphs were stored in a single statement. However, Jena2 supports the use of multiple state-

ment tables in a single database so that applications can flexibly map graphs to different tables. In this

way, graphs that are often accessed together may be stored together while graphs that are never accessed

together may be stored separately.

In principle, applications typically have access patterns in which certain subjects and/or properties are

accessed together. For example, a graph of data about persons might have many occurrences of objects

with properties name, address, phone, and gender that are referenced together. Jena2 uses property table

as a general facility for clustering properties that are commonly accessed together. A property table is a

separate table that stores the subject-value pairs related by a particular property. A property table stores

all instances of the property in the graph where that property does not appear in any other table used

for the graph. In Jena1, each query is evaluated with a single SQL select query over the statement table.

In Jena2, queries have to be generalized because there can be multiple statement tables for a graph.

Search WWH ::

Custom Search

Home