Database Reference
In-Depth Information
FOREACH...GENERATE statements, and adds them to the logical plan without execut-
ing them. The trigger for Pig to start execution is the DUMP statement. At that point, the
logical plan is compiled into a physical plan and executed.
MULTIQUERY EXECUTION
Because DUMP is a diagnostic tool, it will always trigger execution. However, the STORE command is
different. In interactive mode, STORE acts like DUMP and will always trigger execution (this includes the
run command), but in batch mode it will not (this includes the exec command). The reason for this is
efficiency. In batch mode, Pig will parse the whole script to see whether there are any optimizations that
could be made to limit the amount of data to be written to or read from disk. Consider the following
simple example:
A = LOAD 'input/pig/multiquery/A' ;
B = FILTER A BY $1 == 'banana' ;
C = FILTER A BY $1 != 'banana' ;
STORE B INTO 'output/b' ;
STORE C INTO 'output/c' ;
Relations B and C are both derived from A , so to save reading A twice, Pig can run this script as a single
MapReduce job by reading A once and writing two output files from the job, one for each of B and C .
This feature is called multiquery execution .
In previous versions of Pig that did not have multiquery execution, each STORE statement in a script run
in batch mode triggered execution, resulting in a job for each STORE statement. It is possible to restore
the old behavior by disabling multiquery execution with the -M or -no_multiquery option to pig .
The physical plan that Pig prepares is a series of MapReduce jobs, which in local mode
Pig runs in the local JVM and in MapReduce mode Pig runs on a Hadoop cluster.
NOTE
You can see the logical and physical plans created by Pig using the EXPLAIN command on a relation
( EXPLAIN max_temp; , for example).
EXPLAIN will also show the MapReduce plan, which shows how the physical operators are grouped in-
to MapReduce jobs. This is a good way to find out how many MapReduce jobs Pig will run for your
query.
The relational operators that can be a part of a logical plan in Pig are summarized in
Table 16-1 . We go through the operators in more detail in Data Processing Operators .
Search WWH ::




Custom Search