Programming with Pig - Hadoop in Action

Databases Reference

In-Depth Information

10.9

Summary

Pig is a higher-level data processing layer on top of Hadoop. Its Pig Latin language

provides programmers a more intuitive way to specify data flows. It supports schemas

in processing structured data, yet it's flexible enough to work with unstructured text

or semistructured XML data. It's extensible with the use of UDFs. It vastly simplifies

data joining and job chaining—two aspects of MapReduce programming that many

developers found overly complicated. To demonstrate its usefulness, our example of

computing patent cocitation shows a complex MapReduce program written in a dozen

lines of Pig Latin.

Search WWH ::

Custom Search

Home