Database Reference
In-Depth Information
TABLE 5.4
SPARQL Triple Pattern Mapping Using HBase
Predicate Push-Down Filters
Pattern
Table
Filter
(s, p, o)
T s_po or T o_ps
Column and value
(?s, p, o)
T o_ps
Column
(s, ?p, o)
T s_po or T o_ps
Value
(s, p, ?o)
T s_po
Column
(?s, ?p, o)
T o_ps
(?s, p, ?o)
T s_po or T o_ps (table scan)
Column
(s, ?p, ?o)
T s_po
(?s, ?p, ?o)
T s_po or T o_ps (table scan)
has similar performance characteristics compared to the six-table schema but uses
only one third of storage space.
Our experiments also revealed some fundamental scaling limitations of the stor-
age schema caused by T o_ps . In general, an RDF data set uses a relatively small num-
ber of classes but contains many triples that link resources to classes, for example,
(Alex, type, Person). Thus, using the object of a triple as row key means that all
resources of the same class will be stored in the same row. With increasing data set
size these rows become very large and exceed the configured maximum region size
resulting in overloaded regions that contain only a single row. Since HBase cannot
split these regions, the resources of a single machine become a bottleneck for scal-
ability. To circumvent this problem we use a modified T o_ps row key design for triples
with predicate type. Instead of using the object as row key we use a compound row
key of object and subject, for example, (PersonjAlex). As a result, we cannot access
all resources of a class with a single table lookup, but as the corresponding rows will
be consecutive in T o_ps , we can use an efficient range scan starting at the first entry
of the class.
5.6 MAPSIN JOIN
The indexing capabilities of HBase lay the foundation for our Map-Side Index
Nested Loop Join (MAPSIN) that improves the query performance of selective que-
ries. This allows us to retain the flexibility of reduce-side joins while utilizing the
effectiveness of a map-side join without any changes to the underlying frameworks.
We start the discussion by introducing the base case of our join technique followed
by our strategy for cascading a sequence of joins. To the end, we will propose opti-
mizations for multiway joins and one-pattern queries.
5.6.1 b ase C ase
To compute the join between two triple patterns, p 1 p 2 , we have to merge the
compatible mappings for p 1 and p 2 . Therefore, it is necessary that subsets of both
 
Search WWH ::




Custom Search