Database Reference
In-Depth Information
and LOSplitOutput are responsible for splitting the triple relation into vertical
partitions based on the properties in a graph pattern, that is, four LOSplitOutput
operators are linked with a LOSplit indicating that the input relation T will be
split into four subrelations whose properties are p 1, p 2, and p 3, p 4, respectively. Two
LOJoin operators follow for each star join SJ 1 and SJ 2. Then, another operator
LOJoin is used to join the two star subpatterns, and finally, an operator LOStore
stores the output of LOJoin to disk. The complete logical plan is illustrated in
Figure 6.4a.
Program 6.1: A Pig Latin Program for the Example Query
T = Load 'input.nt' using PigStorage(' ') as (S,P,O) ;
SPLIT T into P1 if P eq 'P1', P2 if P eq 'P2', P3 if P eq 'P3', P4 if P eq 'P4' ;
SJ1 = JOIN P1 by S, P2 by S ;
SJ2 = JOIN P3 by S, P4 by S ;
J1 = JOIN SJ1 by $0, SJ2 by $2 ;
STORE J1; ;
6.4.2 P hysiCal P lan t ranslation
The logical plan is translated to a physical plan as shown in Figure 6.4b. The logical
operator LOLoad is mapped to a physical operator POLoad (Pig uses LO and PO
prefixes for logical and physical operators, respectively). The operators LOSplit
and LOSplitOutput are then transformed into a physical operator POFilter
with subexpression operators to select only the triples matching each triple pattern,
that is, a POFilter denotes a selection operation on T based on some conditions
that are described with expression operators, for example, POProject for σ and
P , POConstant for p 1, and EqualTo for = in σ ( P = ′ p 1′) ( T ). Pig then maps the logi-
cal operator LOJoin into a set of physical operators: POLocalRearrange —for
annotating triples from POFilter with their subjects ; POUnion —for unioning
annotated triples in the operator POUnion , and POJoinPackage —for packaging
(joining) triples joined by Subject into n-tuples. Each gray-dotted box in Figure 6.4b
and c denotes a set of operators that annotate triples that match a triple pattern with
its Subject , for example, the POFilter and POLocalRearrange in the first box
annotating the triples matching the triple pattern with the property p 1 (the operators
for other properties ( p 2, p 3, and p 4) are omitted). As a final step, the logical opera-
tor LOStore is mapped into a physical operator POStore . The mappings between
logical/physical operators are denoted by black-dotted boxes in Figure 6.4a and b.
6.4.3 m aP r eDuCe P lan t ranslation
In the final phase of data flow compilation, a physical plan is decomposed into MR
jobs with job dependencies recorded. This is shown in Figure 6.4c. Besides deciding
which MR job an operator should be assigned to, the compiler must also determine
whether the operator executes in the Map or Reduce phase. For example, while most
Search WWH ::




Custom Search