Database Reference
In-Depth Information
execution. This vertical partitioning can be done once in advance using a
single MapReduce job and does not cost more disk space. All RDF triples
with the same predicate are stored in the same partition and every predicate
has its own partition. For queries with unbounded predicate all partitions
have to be processed again, which corresponds to processing the unparti-
tioned RDF data.
5.3.4 e XamPle
Figure 5.4 shows the algebra tree after optimization (pushing Filter execution before
LeftJoin) for the SPARQL query of Section 5.21 (Figure 5.1). The tree is traversed
bottom-up and translated into the following sequence of Pig Latin commands,
assuming a vertical partitioning of the RDF data.
PL
-- Left BGP
(1)
knows = LOAD 'rdf/knows' USING RDFLoader() AS (s,o);
age
= LOAD 'rdf/age' USING RDFLoader() AS (s,o);
f1 = FILTER knows BY o == 'Peter';
t1 = FOREACH f1 GENERATE s AS person;
t2 = FOREACH age GENERATE s AS person, o AS age;
j1 = JOIN t1 BY person, t2 BY person;
BGP1 = FOREACH j1 GENERATE
t1::person AS person, t2::age AS age;
-- FILTER
(2)
F = FILTER BGP1 BY age >= 18;
-- Right BGP
(3)
mbox = LOAD 'rdf/mbox' USING RDFLoader() AS (s,o);
BGP2 = FOREACH mbox GENERATE s AS person,o AS mb;
-- LEFTJOIN
(4)
lj = JOIN F BY person LEFT OUTER, BGP2 BY person;
LJ = FOREACH lj GENERATE F::person AS person,
F::age AS age, BGP2::mb AS mb;
STORE LJ INTO 'output' USING resultWriter();
LeftJoin (4)
BGP (3)
Filter (2)
BGP (1)
FIGURE 5.4
SPARQL algebra tree.
Search WWH ::




Custom Search