Database Reference
In-Depth Information
Table 16-7. A selection of Pig's built-in functions
Category Function
Description
Eval
Calculates the average (mean) value of entries in a bag.
AVG
Concatenates byte arrays or character arrays together.
CONCAT
Calculates the number of non-
null
entries in a bag.
COUNT
Calculates the number of entries in a bag, including those that are
null
.
COUNT_STAR
Calculates the set difference of two bags. If the two arguments are not bags,
returns a bag containing both if they are equal; otherwise, returns an empty
bag.
DIFF
Calculates the maximum value of entries in a bag.
MAX
Calculates the minimum value of entries in a bag.
MIN
Calculates the size of a type. The size of numeric types is always 1; for
character arrays, it is the number of characters; for byte arrays, the number
of bytes; and for containers (tuple, bag, map), it is the number of entries.
SIZE
Calculates the sum of the values of entries in a bag.
SUM
Converts one or more expressions to individual tuples, which are then put in
a bag. A synonym for
()
.
TOBAG
Tokenizes a character array into a bag of its constituent words.
TOKENIZE
Converts an even number of expressions to a map of key-value pairs. A
synonym for
[]
.
TOMAP
Calculates the top n tuples in a bag.
TOP
Converts one or more expressions to a tuple. A synonym for
{}
.
TOTUPLE
Filter
Tests whether a bag or map is empty.
IsEmpty
Load/
Store
Loads or stores relations using a field-delimited text format. Each line is
broken into fields using a configurable field delimiter (defaults to a tab
character) to be stored in the tuple's fields. It is the default storage when
none is specified.
[
a
]
PigStorage
Loads relations from a plain-text format. Each line corresponds to a tuple
whose single field is the line of text.
TextLoader
JsonLoader
,
JsonStorage
Loads or stores relations from or to a (Pig-defined) JSON format. Each
tuple is stored on one line.
Loads or stores relations from or to Avro datafiles.
AvroStorage