Pig - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

This is a classic inner join, where each match between the two relations corresponds to a

row in the result. (It's actually an equijoin because the join predicate is equality.) The res-

ult's fields are made up of all the fields of all the input relations.

You should use the general join operator when all the relations being joined are too large

to fit in memory. If one of the relations is small enough to fit in memory, you can use a

special type of join called a fragment replicate join , which is implemented by distributing

the small input to all the mappers and performing a map-side join using an in-memory

lookup table against the (fragmented) larger relation. There is a special syntax for telling

Pig to use a fragment replicate join: [ 104 ]

grunt> C = JOIN A BY $0, B BY $1 USING 'replicated';

The first relation must be the large one, followed by one or more small ones (all of which

must fit in memory).

Pig also supports outer joins using a syntax that is similar to SQL's (this is covered for

Hive in Outer joins ) . For example:

grunt> C = JOIN A BY $0 LEFT OUTER, B BY $1;

grunt> DUMP C;

(1,Scarf,,)

(2,Tie,Hank,2)

(2,Tie,Joe,2)

(3,Hat,Eve,3)

(4,Coat,Hank,4)

COGROUP

JOIN always gives a flat structure: a set of tuples. The COGROUP statement is similar to

JOIN , but instead creates a nested set of output tuples. This can be useful if you want to

exploit the structure in subsequent statements:

grunt> D = COGROUP A BY $0, B BY $1;

grunt> DUMP D;

(0,{},{(Ali,0)})

(1,{(1,Scarf)},{})

(2,{(2,Tie)},{(Hank,2),(Joe,2)})

(3,{(3,Hat)},{(Eve,3)})

(4,{(4,Coat)},{(Hank,4)})

COGROUP generates a tuple for each unique grouping key. The first field of each tuple is

the key, and the remaining fields are bags of tuples from the relations with a matching key.

The first bag contains the matching tuples from relation A with the same key. Similarly,

the second bag contains the matching tuples from relation B with the same key.

Search WWH ::

Custom Search

Home