Relational Techniques for Storing and Querying RDF Data - Advanced Database Query Systems

Database Reference

In-Depth Information

Using the knowledge of the frequent access patterns to construct the property-tables and influence the

underlying database storage structures can provide a performance benefit and reduce the number of join

operations during the query evaluation process.

Chong et al. (2005) have introduced an Oracle-based SQL table function RDFMATCH to query

RDF data. The results of RDFMATCH table function can be further processed by SQLs rich querying

capabilities and seamlessly combined with queries on traditional relational data. The core implementa-

tion of RDFMATCH query translates to a self-join query on Triple-based RDF table store. The resulting

query is executed efficiently by making use of B-tree indexes as well as creating materialized join views

for specialized subject-property. Subject-Property Matrix materialized join views is used To minimize

the query processing overheads that are inherent in the canonical triples-based representation of RDF.

The materialized join views are incrementally maintained based on user demand and query workloads.

A special module is provided to analyze table of RDF triples and estimate the size of various material-

ized views, based on which a user can define a subset of materialized views. For a group of subjects,

the system defines a set of single-valued properties that occur together. These can be direct properties

of these subjects or nested properties. A property p1 is a direct property of subject x 1 if there is a triple

(x 1 ,p 1 ,x 2 ). A property pm is a nested property of subject x 1 if there is a set of triples such as, (x 1 ,p 1 ,x 2 ),...,

(x m ,p m ,x m+1 ), where m> 1. For example, if there is a set of triples, (John, address, addr1), (addr1, zip,

03062), then zip is a nested property of John .

Levandoski & Mokbel (2009) have presented another property table approach for storing RDF data

without any assumption about the query workload statistics. The main goals of this approach are: (1)

reducing the number of join operations which are required during the RDF query evaluation process by

storing related RDF properties together (2) reducing the need to process extra data by tuning null storage

to fall below a given threshold. The approach provides a tailored schema for each RDF data set which

represents a balance between property tables and binary tables and is based on two main parameters: 1)

Support threshold which represents a value to measure the strength of correlation between properties in

the RDF data. 2) The null threshold which represents the percentage of null storage tolerated for each

table in the schema. The approach involves two phases: clustering and partitioning . The clustering phase

scans the RDF data to automatically discover groups of related properties (i.e., properties that always

exist together for a large number of subjects). Based on the support threshold, each set of n properties

which are grouped together in the same cluster are good candidates to constitute a single n-ary table and

the properties which are not grouped in any cluster are good candidates for storage in binary tables. The

partitioning phase goes over the formed clusters and balances the tradeoff between storing as many RDF

properties in clusters as possible while keeping null storage to a minimum based on the null threshold.

One of the main concerns of the partitioning phase is twofold: is to ensure the non-overlapping between

the clusters and that each property exists in a single cluster and reduces the number of table accesses

and unions necessary in query processing.

Matono et al. (2005) have proposed a path-based relational RDF database. The main focus of this

approach is to improve the performance for path queries by extracting all reachable path expressions

for each resource, and store them. Thus, there is no need to perform join operations unlike the flat tripe

stores or the property tables approach. In this approach, the RDF graph is divided into subgraphs and

then each subgraph is stored by applicable techniques into distinct relational tables. More precisely, all

classes and properties are extracted from RDF schema data, and all resources are also extracted from

RDF data. Each extracted item is assigned an identifier and a path expression and stored in correspond-

ing relational table.

Advanced Database Query Systems

Search WWH ::

Custom Search

Home