Database Reference
In-Depth Information
Map : Collection of data items where each item can be looked up by an
associated key.
'name' 'John'
('Sarah')
('Bob')
'knows'
Operators: A Pig Latin program consists of a sequence of instructions where
each instruction performs a single data transformation. We shortly introduce those
Pig Latin operators that we used for our translation. The interested reader can find a
more detailed description of Pig Latin in [16].
LOAD deserializes the input data and maps it to the data model of Pig
Latin. The user can implement a User Defined Function (UDF) that defines
how to map an input tuple to a Pig Latin tuple as shown in the following
example. The result of LOAD is a bag of tuples.
people = LOAD 'input' USING myLoad() AS (name, age);
FOR EACH can be used to apply some processing on every tuple of a bag.
It can also be used for projection or adding new fields to a tuple.
A = FOREACH people GENERATE name, age >= 18? 'adult' :
'minor' AS type;
FI LTER allows to remove unwanted tuples of a bag.
B = FILTER people BY age >= 18;
[ OUTER ] JOIN performs an equi or outer join between bags. It can also be
applied to more than two bags at once (multijoin).
C = JOIN A BY name [LEFT OUTER], B BY name;
UNION can be used to combine two or more bags. Unlike relational data-
bases, the schemas of the tuples do not have to match although this is not
recommended in general since the schema information, especially the alias
names of the fields, is lost in such cases.
D = UNION B, C;
SPLIT partitions a bag into two or more bags that do not have to be distinct
or complete, that is, tuples can end up in more than one partition or no
partition at all.
SPLIT people INTO E IF age < 18, F IF age >= 21;
Search WWH ::




Custom Search