Database Reference
In-Depth Information
and
LOSplitOutput
are responsible for splitting the triple relation into vertical
partitions based on the properties in a graph pattern, that is, four
LOSplitOutput
operators are linked with a
LOSplit
indicating that the input relation
T
will be
split into four subrelations whose properties are
p
1,
p
2, and
p
3,
p
4, respectively. Two
LOJoin
operators follow for each star join
SJ
1 and
SJ
2. Then, another operator
LOJoin
is used to join the two star subpatterns, and finally, an operator
LOStore
stores the output of
LOJoin
to disk. The complete logical plan is illustrated in
Figure 6.4a.
Program 6.1:
A Pig Latin Program for the Example Query
T = Load 'input.nt' using PigStorage(' ') as (S,P,O) ;
SPLIT T into P1 if P eq 'P1', P2 if P eq 'P2', P3 if P eq 'P3', P4 if P eq 'P4' ;
SJ1 = JOIN P1 by S, P2 by S ;
SJ2 = JOIN P3 by S, P4 by S ;
J1 = JOIN SJ1 by $0, SJ2 by $2 ;
STORE J1; ;
6.4.2 P
hysiCal
P
lan
t
ranslation
The logical plan is translated to a physical plan as shown in Figure 6.4b. The logical
operator
LOLoad
is mapped to a physical operator
POLoad
(Pig uses LO and PO
prefixes for logical and physical operators, respectively). The operators
LOSplit
and
LOSplitOutput
are then transformed into a physical operator
POFilter
with subexpression operators to select only the triples matching each triple pattern,
that is, a
POFilter
denotes a selection operation on
T
based on some conditions
that are described with expression operators, for example,
POProject
for σ and
P
,
POConstant
for
p
1, and
EqualTo
for = in σ
(
P
= ′
p
1′)
(
T
). Pig then maps the logi-
cal operator
LOJoin
into a set of physical operators:
POLocalRearrange
—for
annotating triples from
POFilter
with their
subjects
;
POUnion
—for unioning
annotated triples in the operator
POUnion
, and
POJoinPackage
—for packaging
(joining) triples joined by
Subject
into n-tuples. Each gray-dotted box in Figure 6.4b
and c denotes a set of operators that annotate triples that match a triple pattern with
its
Subject
, for example, the
POFilter
and
POLocalRearrange
in the first box
annotating the triples matching the triple pattern with the property
p
1 (the operators
for other properties (
p
2,
p
3, and
p
4) are omitted). As a final step, the logical opera-
tor LOStore is mapped into a physical operator
POStore
. The mappings between
logical/physical operators are denoted by black-dotted boxes in Figure 6.4a and b.
6.4.3 m
aP
r
eDuCe
P
lan
t
ranslation
In the final phase of data flow compilation, a physical plan is decomposed into MR
jobs with job dependencies recorded. This is shown in Figure 6.4c. Besides deciding
which MR job an operator should be assigned to, the compiler must also determine
whether the operator executes in the Map or Reduce phase. For example, while most
Search WWH ::
Custom Search