Database Reference
In-Depth Information
Parser
Local optimizer
MapReduce
compiler
MapReduce
optimizer
Hadoop
FIGURE 2.12 Pig compilation and execution steps. (From C. Olston et al., Pig latin: A not-
so-foreign language for data processing, in SIGMOD , pp. 1099-1110, 2008.)
To accommodate specialized data-processing tasks, Pig Latin has extensive sup-
port for user-defined functions (UDFs). The input and output of UDFs in Pig Latin
follow its fully nested data model. Pig Latin is architected such that the parsing of the
Pig Latin program and the logical plan construction is independent of the execution
platform. Only the compilation of the logical plan into a physical plan depends on
the specific execution platform chosen. Currently, Pig Latin programs are compiled
into sequences of MapReduce jobs that are executed using the Hadoop MapReduce
environment. In particular, a Pig Latin program goes through a series of transforma-
tion steps [109] before being executed as depicted in Figure 2.12. The parsing steps
verifies that the program is syntactically correct and that all referenced variables are
defined. The output of the parser is a canonical logical plan with a one-to-one cor-
respondence between Pig Latin statements and logical operators that are arranged in
a directed acyclic graph (DAG). The logical plan generated by the parser is passed
through a logical optimizer. In this stage, logical optimizations such as projection
pushdown are carried out. The optimized logical plan is then compiled into a series
of MapReduce jobs that are then passed through another optimization phase. The
DAG of optimized MapReduce jobs is then topologically sorted and jobs are submit-
ted to Hadoop for execution.
2.4.3 h ive
The Hive project* is an open-source data warehousing solution that has been built
by the Facebook Data Infrastructure Team on top of the Hadoop environment [123].
The main goal of this project is to bring the familiar relational database concepts
(e.g., tables, columns, partitions) and a subset of SQL to the unstructured world of
Hadoop while still maintaining the extensibility and flexibility that Hadoop provides.
* http://hadoop.apache.org/hive/.
Search WWH ::




Custom Search