Databases Reference
In-Depth Information
10.9
Summary
Pig is a higher-level data processing layer on top of Hadoop. Its Pig Latin language
provides programmers a more intuitive way to specify data flows. It supports schemas
in processing structured data, yet it's flexible enough to work with unstructured text
or semistructured XML data. It's extensible with the use of UDFs. It vastly simplifies
data joining and job chaining—two aspects of MapReduce programming that many
developers found overly complicated. To demonstrate its usefulness, our example of
computing patent cocitation shows a complex MapReduce program written in a dozen
lines of Pig Latin.
 
Search WWH ::




Custom Search