Database Reference
In-Depth Information
Hive [44] provide high-level data flow or query languages and automatic optimiza-
tion techniques a la relational database systems, and are gaining popularity.
In the context of scalable processing of Semantic Web data, query processing
techniques have to address the unique challenges posed by semantic graph data mod-
els such as W3C's Resource Description Framework (RDF). In RDF, data is mod-
eled at a fine-grained level in the form of what are called statements or triples . A
triple ( Subject , Property , Object ) asserts that a resource (the Subject ) has a Property
whose value (the Object ) is another resource or a literal. Figure 6.1a shows a subset
of triples describing vendor-product offerings in the commerce context. The exam-
ple triple ( &Off 1, product , &Pr 1) states that an offer &Off 1 is for a product &Pr 1.
Further, nodes and edges may be linked to an ontological layer that defines types of
nodes and edges and relationships between node types and edge types. Therefore, an
RDF model may entail additional statements beyond what is explicitly asserted, and
entailed statements can be derived using inferencing techniques. An RDF database
is a collection of triples but can also be viewed as a directed labeled graph with nodes
representing subjects and objects , and labeled edges denoting property types.
The foundational construct for querying an RDF graph is a triple pattern , which
is a triple with a variable (denoted by leading '?') in any of the Subject, Property, or
Object positions, for example, (? o vendor ? v ). A triple t in the database is considered
a valid “match” for a triple pattern tp if some substitution of the variables in tp with
RDF terms (resources or literals), yields t . For example, triple t = (& Off 1, vendor ,
& V 1) is a match for triple pattern (? o vendor ? v ) because the substitutions ? o/ & Off 1
and ? v /& V 1 produce a valid triple t in the database. Multiple triple patterns can be
combined using combination operators to form Graph Patterns . A Basic Graph
Pattern is a conjunction of two or more triple patterns and shared variables across
triple patterns implicitly represent equi-joins. The example query Q1 in Figure 6.1b
is a graph pattern with 10 triple patterns with 9 implicit joins (3 with ? v , 3 with ? o ,
2 with ? r , 1 with ? prod ). The answer to a graph pattern is the list of substitutions for
each variable that agree across triple patterns (i.e., meet join conditions).
An important consequence of RDF's fine-grained data model is that query work-
loads contain much larger number of joins than relational workloads. In fact, up to
100 join operations have been reported in queries for some real applications [43].
This number can be further increased if a complete result, including entailed triples,
is desired and inferencing is done as query expansion during processing. The first
challenge in this context is that relational query plan generation strategies will be
confounded by the large search space generated by such numbers of joins. Further,
the heuristics that are used to prune search space such as restricting to only left-linear
plans, are not ideal for RDF queries. In fact, bushy plans are often preferred since
many of the joins are m-way star joins and can be evaluated as merge joins if suitable
sorted indexes exist. Consequently, such queries have a much larger query plan search
space than existing strategies are suited for. A second subtle but critical point is that
cost-based query optimization techniques used for ordering join operations assume
conditions that do not carry over to RDF models. For example, the containment of
value sets assumption that underlies join cardinality estimation exploits the referential
integrity constraint in relational models that allows this assumption to hold in many
relational join scenarios (e.g., join between foreign and primary key fields). However,
Search WWH ::




Custom Search