Database Reference
In-Depth Information
Further Reading
This chapter provided a basic introduction to using Pig. For a more detailed guide, see
Pro-
gramming Pig
by Alan Gates (O'Reilly, 2011).
[
97
]
Or as the
Pig Philosophy
has it, “Pigs eat anything.”
[
98
]
Not to be confused with Pig Latin, the language game. English words are translated into Pig Latin by
moving the initial consonant sound to the end of the word and adding an “ay” sound. For example, “pig” be-
comes “ig-pay,” and “Hadoop” becomes “Adoop-hay.”
[
99
]
Pig Latin does not have a formal language definition as such, but there is a comprehensive guide to the
language that you can find through a link on the
Pig website
.
[
100
]
You sometimes see these terms being used interchangeably in documentation on Pig Latin: for example,
“
GROUP
command,” “
GROUP
operation,” “
GROUP
statement.”
[
102
]
Although not relevant for this example, eval functions that operate on a bag may additionally implement
Pig's
Algebraic
or
Accumulator
interfaces for more efficient processing of the bag in chunks.
[
103
]
There is a more fully featured UDF for doing the same thing in the Piggy Bank called
FixedWidthLoader
.
[
104
]
There are more keywords that may be used in the
USING
clause, including
'skewed'
(for large data-
sets with a skewed keyspace),
'merge'
(to effect a merge join for inputs that are already sorted on the join
key), and
'merge-sparse'
(where 1% or less of data is matched). See Pig's documentation for details on
how to use these specialized joins.
[
105
]
Tamer Elsayed, Jimmy Lin, and Douglas W. Oard,
“Pairwise Document Similarity in Large Collections
with MapReduce,”
Proceedings of the 46th Annual Meeting of the Association of Computational Linguistics
,
June 2008.