Database Reference
In-Depth Information
Therefore, several approaches have been proposed to deal with this limitation by using extensive set of
indexes or by using selectivity estimation information to optimize the join ordering.
Harris & Gibbins (2003) have described the 3store RDF storage system. The storage system of 3Store
is based on a central triple table which holds the hashes for the subject, predicate, object and graph
identifier. The graph identifier equal to zero if the triple resides in the anonymous background graph. A
symbols table is used to allow reverse lookups from the hash to the hashed value, for example, to return
results. Furthermore it allows SQL operations to be performed on pre-computed values in the data types
of the columns without the use of casts. For evaluating SPARQL queries, the triples table is joined once
for each triple in the graph pattern where variables are bound to their values when they encounter the
slot in which the variable appears. Subsequent occurrences of variables in the graph pattern are used to
constrain any appropriate joins with their initial binding. To produce the intermediate results table, the
hashes of any SPARQL variables required to be returned in the results set are projected and the hashes
from the intermediate results table are joined to the symbols table to provide the textual representation
of the results.
Neumann & Weikum (2008) have presented the RDF-3X (RDF Triple eXpress) RDF query engine
which tries to overcome the criticism that triples stores incurs too many expensive self-joins by creat-
ing the exhaustive set of indexes and relying on fast processing of merge joins. The physical design of
RDF-3x is workload-independent and eliminates the need for physical-design tuning by building indexes
over all 6 permutations of the three dimensions that constitute an RDF triple. Additionally, indexes over
count-aggregated variants for all three two-dimensional and all three one-dimensional projections. The
query processor follows RISC-style design philosophy (Chaudhuri & Weikum, 2000) by using the full
set of indexes on the triple tables to rely mostly on merge joins over sorted index lists. The query op-
timizer relies upon its cost model in finding the lowest-cost execution plan and mostly focuses on join
order and the generation of execution plans. In principle, selectivity estimation has a huge impact on
plan generation. While this is a standard problem in database systems, the schema-free nature of RDF
data makes the problem more challenging. RDF-3X employs dynamic programming for plan enumera-
tion, with a cost model based on RDF-specific statistical synopses. It relies on two kinds of statistics:
1) specialized histograms which are generic and can handle any kind of triple patterns and joins. The
disadvantage of histograms is that it assumes independence between predicates. 2) frequent join paths
in the data which give more accurate estimation. During query optimization, the query optimizer uses
the join-path selectivity information when available and otherwise assume independence and use the
histograms information. Neumann & Weikum (2009) have extended the work further by introducing a
runtime technique for accelerating query executions. It uses a light-weight, RDF-specific technique for
sideways information passing across different joins and index scans within the query execution plans.
They have also enhanced the selectivity estimator of the query optimizer by using very fast index lookups
on specifically designed aggregation indexes, rather than relying on the usual kinds of coarse-grained
histograms. This provides much more accurate estimates at compile-time, at a fairly small cost that is
easily amortized by providing better directives for the join-order optimization.
Weiss et al. (2008) have presented the Hexastore RDF storage scheme with main focuses on scalability
and generality in its data storage, processing and representation. Hexastore is based on the idea of indexing
the RDF data in a multiple indexing scheme (Harth & Decker, 2005). It does not discriminate against any
RDF element and treats subjects, properties and objects equally. Each RDF element type have its special
index structures built around it. Moreover, every possible ordering of the importance or precedence of
the three elements in an indexing scheme is materialized. Each index structure in a Hexastore centers
Search WWH ::




Custom Search