Database Reference
In-Depth Information
If for a particular key a relation has no matching key, the bag for that relation is empty.
For example, since no one has bought a scarf (with ID 1), the second bag in the tuple for
that row is empty. This is an example of an outer join, which is the default type for
COGROUP . It can be made explicit using the OUTER keyword, making this COGROUP
statement the same as the previous one:
D = COGROUP A BY $0 OUTER , B BY $1 OUTER ;
You can suppress rows with empty bags by using the INNER keyword, which gives the
COGROUP inner join semantics. The INNER keyword is applied per relation, so the fol-
lowing suppresses rows only when relation A has no match (dropping the unknown
product 0 here):
grunt> E = COGROUP A BY $0 INNER, B BY $1;
grunt> DUMP E;
(1,{(1,Scarf)},{})
(2,{(2,Tie)},{(Hank,2),(Joe,2)})
(3,{(3,Hat)},{(Eve,3)})
(4,{(4,Coat)},{(Hank,4)})
We can flatten this structure to discover who bought each of the items in relation A :
grunt> F = FOREACH E GENERATE FLATTEN(A), B.$0;
grunt> DUMP F;
(1,Scarf,{})
(2,Tie,{(Hank),(Joe)})
(3,Hat,{(Eve)})
(4,Coat,{(Hank)})
Using a combination of COGROUP , INNER , and FLATTEN (which removes nesting) it's
possible to simulate an (inner) JOIN :
grunt> G = COGROUP A BY $0 INNER, B BY $1 INNER;
grunt> H = FOREACH G GENERATE FLATTEN($1), FLATTEN($2);
grunt> DUMP H;
(2,Tie,Hank,2)
(2,Tie,Joe,2)
(3,Hat,Eve,3)
(4,Coat,Hank,4)
This gives the same result as JOIN A BY $0, B BY $1 .
If the join key is composed of several fields, you can specify them all in the BY clauses of
the JOIN or COGROUP statement. Make sure that the number of fields in each BY clause
is the same.
Search WWH ::




Custom Search