Database Reference
In-Depth Information
around one RDF element and defines a prioritization between the other two elements. Two vectors are
associated with each RDF element (e.g., subject), one for each of the other two RDF elements (e.g.,
property and object). In addition, lists of the third RDF element are appended to the elements in these
vectors. In total, six distinct indices are used for indexing the RDF data. These indices materialize all
possible orders of precedence of the three RDF elements. A clear disadvantage of this approach is that
Hexastore features a worst-case fivefold storage increase in comparison to a conventional triples table.
PROPERTY TABLE STORES
Due to the proliferations of self-joins involved with the triple-store, the property table approach was
proposed. The main idea of this approach is to create separate n-ary tables (property tables) for subjects
that tend to have common properties together in a single table. Hence, designing the schema of the prop-
erty tables depends on the availability of either explicit or implicit information about the characteristics
of the objects in the RDF dataset.
Jena is a an open-source toolkit for Semantic Web programmers (McBride, 2002). It implements
persistence for RDF graphs using an SQL database through a JDBC connection. The schema of the
first version of Jena, Jena1, consisted of a statement table, a literals table and a resources table. The
statement table (Subject, Predicate, ObjectURI, ObjectLiteral) contained all statements and referenced
the resources and literals tables for subjects, predicates and objects. To distinguish literal objects from
resource URIs, two columns were used. The literals table contained all literal values and the resources
table contained all resource URIs in the graph. However, every query operation required multiple joins
between the statement table and the literals table or the resources table.
To address this problem, the Jena2 schema trades-off space for time. It uses a denormalized schema
in which resource URIs and simple literal values are stored directly in the statement table. In order to
distinguish database references from literals and URIs, column values are encoded with a prefix that
indicates which the kind of value. A separate literals table is only used to store literal values whose
length exceeds a threshold, such as blobs. Similarly, a separate resources table is used to store long URIs.
By storing values directly in the statement table it is possible to perform many queries without a join.
However, a denormalized schema uses more database space because the same value (literal or URI) is
stored repeatedly. The increase in database space consumption is addressed by using string compression
schemes. Both Jena1 and Jena2 permit multiple graphs to be stored in a single database instance. In
Jena1, all graphs were stored in a single statement. However, Jena2 supports the use of multiple state-
ment tables in a single database so that applications can flexibly map graphs to different tables. In this
way, graphs that are often accessed together may be stored together while graphs that are never accessed
together may be stored separately.
In principle, applications typically have access patterns in which certain subjects and/or properties are
accessed together. For example, a graph of data about persons might have many occurrences of objects
with properties name, address, phone, and gender that are referenced together. Jena2 uses property table
as a general facility for clustering properties that are commonly accessed together. A property table is a
separate table that stores the subject-value pairs related by a particular property. A property table stores
all instances of the property in the graph where that property does not appear in any other table used
for the graph. In Jena1, each query is evaluated with a single SQL select query over the statement table.
In Jena2, queries have to be generalized because there can be multiple statement tables for a graph.
Search WWH ::




Custom Search