Database Reference
In-Depth Information
Using the knowledge of the frequent access patterns to construct the property-tables and influence the
underlying database storage structures can provide a performance benefit and reduce the number of join
operations during the query evaluation process.
Chong et al. (2005) have introduced an Oracle-based SQL table function RDFMATCH to query
RDF data. The results of RDFMATCH table function can be further processed by SQLs rich querying
capabilities and seamlessly combined with queries on traditional relational data. The core implementa-
tion of RDFMATCH query translates to a self-join query on Triple-based RDF table store. The resulting
query is executed efficiently by making use of B-tree indexes as well as creating materialized join views
for specialized subject-property. Subject-Property Matrix materialized join views is used To minimize
the query processing overheads that are inherent in the canonical triples-based representation of RDF.
The materialized join views are incrementally maintained based on user demand and query workloads.
A special module is provided to analyze table of RDF triples and estimate the size of various material-
ized views, based on which a user can define a subset of materialized views. For a group of subjects,
the system defines a set of single-valued properties that occur together. These can be direct properties
of these subjects or nested properties. A property p1 is a direct property of subject x 1 if there is a triple
(x 1 ,p 1 ,x 2 ). A property pm is a nested property of subject x 1 if there is a set of triples such as, (x 1 ,p 1 ,x 2 ),...,
(x m ,p m ,x m+1 ), where m> 1. For example, if there is a set of triples, (John, address, addr1), (addr1, zip,
03062), then zip is a nested property of John .
Levandoski & Mokbel (2009) have presented another property table approach for storing RDF data
without any assumption about the query workload statistics. The main goals of this approach are: (1)
reducing the number of join operations which are required during the RDF query evaluation process by
storing related RDF properties together (2) reducing the need to process extra data by tuning null storage
to fall below a given threshold. The approach provides a tailored schema for each RDF data set which
represents a balance between property tables and binary tables and is based on two main parameters: 1)
Support threshold which represents a value to measure the strength of correlation between properties in
the RDF data. 2) The null threshold which represents the percentage of null storage tolerated for each
table in the schema. The approach involves two phases: clustering and partitioning . The clustering phase
scans the RDF data to automatically discover groups of related properties (i.e., properties that always
exist together for a large number of subjects). Based on the support threshold, each set of n properties
which are grouped together in the same cluster are good candidates to constitute a single n-ary table and
the properties which are not grouped in any cluster are good candidates for storage in binary tables. The
partitioning phase goes over the formed clusters and balances the tradeoff between storing as many RDF
properties in clusters as possible while keeping null storage to a minimum based on the null threshold.
One of the main concerns of the partitioning phase is twofold: is to ensure the non-overlapping between
the clusters and that each property exists in a single cluster and reduces the number of table accesses
and unions necessary in query processing.
Matono et al. (2005) have proposed a path-based relational RDF database. The main focus of this
approach is to improve the performance for path queries by extracting all reachable path expressions
for each resource, and store them. Thus, there is no need to perform join operations unlike the flat tripe
stores or the property tables approach. In this approach, the RDF graph is divided into subgraphs and
then each subgraph is stored by applicable techniques into distinct relational tables. More precisely, all
classes and properties are extracted from RDF schema data, and all resources are also extracted from
RDF data. Each extracted item is assigned an identifier and a path expression and stored in correspond-
ing relational table.
Search WWH ::




Custom Search