Database Reference
In-Depth Information
Further Reading
This chapter provided a basic introduction to using Pig. For a more detailed guide, see Pro-
gramming Pig by Alan Gates (O'Reilly, 2011).
[ 96 ] History is stored in a file called .pig_history in your home directory.
[ 97 ] Or as the Pig Philosophy has it, “Pigs eat anything.”
[ 98 ] Not to be confused with Pig Latin, the language game. English words are translated into Pig Latin by
moving the initial consonant sound to the end of the word and adding an “ay” sound. For example, “pig” be-
comes “ig-pay,” and “Hadoop” becomes “Adoop-hay.”
[ 99 ] Pig Latin does not have a formal language definition as such, but there is a comprehensive guide to the
language that you can find through a link on the Pig website .
[ 100 ] You sometimes see these terms being used interchangeably in documentation on Pig Latin: for example,
GROUP command,” “ GROUP operation,” “ GROUP statement.”
[ 101 ] Pig actually comes with an equivalent built-in function called TRIM .
[ 102 ] Although not relevant for this example, eval functions that operate on a bag may additionally implement
Pig's Algebraic or Accumulator interfaces for more efficient processing of the bag in chunks.
[ 103 ] There is a more fully featured UDF for doing the same thing in the Piggy Bank called
FixedWidthLoader .
[ 104 ] There are more keywords that may be used in the USING clause, including 'skewed' (for large data-
sets with a skewed keyspace), 'merge' (to effect a merge join for inputs that are already sorted on the join
key), and 'merge-sparse' (where 1% or less of data is matched). See Pig's documentation for details on
how to use these specialized joins.
[ 105 ] Tamer Elsayed, Jimmy Lin, and Douglas W. Oard, “Pairwise Document Similarity in Large Collections
with MapReduce,” Proceedings of the 46th Annual Meeting of the Association of Computational Linguistics ,
June 2008.
Search WWH ::




Custom Search