Database Reference
In-Depth Information
Fig. 9.12 Pig compilation
and execution steps
4. Map : A map is a collection of data items, where each item has an associated key
through which it can be looked up. As with bags, the schema of the constituent
data items is flexible However, the keys are required to be data atoms, e.g.
“k1” ! .“ alice ”; “ lakers ”/
“k2” ! “20”
:
To accommodate specialized data processing tasks, Pig Latin has extensive
support for user-defined functions (UDFs). The input and output of UDFs in Pig
Latin follow its fully nested data model. Pig Latin is architected such that the parsing
of the Pig Latin program and the logical plan construction is independent of the
execution platform. Only the compilation of the logical plan into a physical plan
depends on the specific execution platform chosen. Currently, Pig Latin programs
are compiled into sequences of MapReduce jobs which are executed using the
Hadoop MapReduce environment. In particular, a Pig Latin program goes through a
series of transformation steps [ 188 ] before being executed as depicted in Fig. 9.12 .
The parsing steps verifies that the program is syntactically correct and that all
referenced variables are defined. The output of the parser is a canonical logical
plan with a one-to-one correspondence between Pig Latin statements and logical
operators which are arranged in a directed acyclic graph (DAG). The logical plan
generated by the parser is passed through a logical optimizer. In this stage, logical
optimizations such as projection pushdown are carried out. The optimized logical
plan is then compiled into a series of MapReduce jobs which are then passed
through another optimization phase. The DAG of optimized MapReduce jobs is
then topologically sorted and jobs are submitted to Hadoop for execution.
Hive
The Hive project [ 11 ] is an open-source data warehousing solution which has
been built by the Facebook Data Infrastructure Team on top of the Hadoop
environment [ 222 ]. The main goal of this project is to bring the familiar relational
database concepts (e.g. tables, columns, partitions) and a subset of SQL to the
Search WWH ::




Custom Search