Database Reference
In-Depth Information
multisets of mappings are brought together such that all compatible mappings can
be processed on the same machine. Our MAPSIN join technique computes the join
between p 1 and p 2 in a single map phase. At the beginning, the map phase is ini-
tialized with a parallel distributed HBase table scan for the first triple pattern p 1
where each machine retrieves only those mappings that are locally available. This
is achieved by utilizing a mechanism for allocating local records to map functions,
which is supported by the MapReduce input format for HBase. The map function is
invoked for each retrieved mapping μ 1 for p 1 . To compute the partial join between p 1
and p 2 for the given mapping μ 1 , the map function needs to retrieve those mappings
for p 2 that are compatible to μ 1 based on the shared variables between p 1 and p 2 . At
this point, the map function utilizes the input mapping μ 1 to substitute the shared
variables in p 2 , that is, the join variables. The substituted triple pattern, p
sub
2
=µ( ,
is then used to retrieve the compatible mappings with a table lookup in HBase fol-
lowing the triple pattern mapping outlined in Table 5.4. Since there is no guarantee
that the corresponding HBase entries reside on the same machine, the results of the
request have to be transferred over the network in general. However, in contrast to a
reduce-side join approach where a lot of data is transferred over the network, we only
transfer the data that is really needed. Finally, the computed multiset of mappings is
stored in HDFS.
Figure 5.7 is an example for the base case that illustrates the join between the first
two triple patterns of the SPARQL query in Figure 5.6. While the mappings for the
first triple pattern (? article , title, ? title ) are retrieved locally using a distributed table
scan (step 1+2), the compatible mappings for (? article , author, ? author ) are requested
within the map function (step 3) and the resulting set of mappings is stored in HDFS
(step 4).
p
1
2
1
1
SCAN for local mappings: ?article title ?title
2
2
2
Map inputs
Node 3
Node 1
2
?article= article1 ?title= " PigSPARQL "
NoSQL
Storage system
3
3
GET bindings: article1 author ?author
?article= article2 ?title= " RDFPath "
Node 2
3
GET bindings: article2 author ?author
3
4
Map outputs
4
?article=article1 ?title= " PigSPARQL " ?author=Alex
?article=article1 ?title= " PigSPARQL " ?author=Martin
4
HDFS
4
?article=article2 ?title= " RDFPath "
?article=article2 ?title= " RDFPath "
?author=Martin
?author=Alex
FIGURE 5.7
MAPSIN join base case for the first two triple patterns of query in Figure 5.6.
Search WWH ::




Custom Search