Large-Scale RDF Processing with MapReduce - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

execution. This vertical partitioning can be done once in advance using a

single MapReduce job and does not cost more disk space. All RDF triples

with the same predicate are stored in the same partition and every predicate

has its own partition. For queries with unbounded predicate all partitions

have to be processed again, which corresponds to processing the unparti-

tioned RDF data.

5.3.4 e XamPle

Figure 5.4 shows the algebra tree after optimization (pushing Filter execution before

LeftJoin) for the SPARQL query of Section 5.21 (Figure 5.1). The tree is traversed

bottom-up and translated into the following sequence of Pig Latin commands,

assuming a vertical partitioning of the RDF data.

PL

-- Left BGP

(1)

knows = LOAD 'rdf/knows' USING RDFLoader() AS (s,o);

age

= LOAD 'rdf/age' USING RDFLoader() AS (s,o);

f1 = FILTER knows BY o == 'Peter';

t1 = FOREACH f1 GENERATE s AS person;

t2 = FOREACH age GENERATE s AS person, o AS age;

j1 = JOIN t1 BY person, t2 BY person;

BGP1 = FOREACH j1 GENERATE

t1::person AS person, t2::age AS age;

-- FILTER

(2)

F = FILTER BGP1 BY age >= 18;

-- Right BGP

(3)

mbox = LOAD 'rdf/mbox' USING RDFLoader() AS (s,o);

BGP2 = FOREACH mbox GENERATE s AS person,o AS mb;

-- LEFTJOIN

(4)

lj = JOIN F BY person LEFT OUTER, BGP2 BY person;

LJ = FOREACH lj GENERATE F::person AS person,

F::age AS age, BGP2::mb AS mb;

STORE LJ INTO 'output' USING resultWriter();

LeftJoin (4)

BGP (3)

Filter (2)

BGP (1)

FIGURE 5.4

SPARQL algebra tree.

Search WWH ::

Custom Search

Home