Database Reference
In-Depth Information
Precisely which Hadoop filesystem is used is determined by the
fs.defaultFS
prop-
erty in the site file for Hadoop Core. See
The Command-Line Interface
for more details
on how to configure this property.
These commands are mostly self-explanatory, except
set
, which is used to set options
that control Pig's behavior (including arbitrary MapReduce job properties). The
debug
option is used to turn debug logging on or off from within a script (you can also control
the log level when launching Pig, using the
-d
or
-debug
option):
grunt>
set debug on
Another useful option is the
job.name
option, which gives a Pig job a meaningful
name, making it easier to pick out your Pig MapReduce jobs when running on a shared
Hadoop cluster. If Pig is running a script (rather than operating as an interactive query
from Grunt), its job name defaults to a value based on the script name.
ference is that
exec
runs the script in batch mode in a new Grunt shell, so any aliases
defined in the script are not accessible to the shell after the script has completed. On the
other hand, when running a script with
run
, it is as if the contents of the script had been
entered manually, so the command history of the invoking shell contains all the statements
from the script. Multiquery execution, where Pig executes a batch of statements in one go
CONTROL FLOW
By design, Pig Latin lacks native control flow statements. The recommended approach for writing pro-
grams that have conditional logic or loop constructs is to embed Pig Latin in another language, such as
Python, JavaScript, or Java, and manage the control flow from there. In this model, the host script uses a
compile-bind-run API to execute Pig scripts and retrieve their status. Consult the Pig documentation for
details of the API.
Embedded Pig programs always run in a JVM, so for Python and JavaScript you use the
pig
command
followed by the name of your script, and the appropriate Java scripting engine will be selected (Jython
for Python, Rhino for JavaScript).
Expressions
An expression is something that is evaluated to yield a value. Expressions can be used in
Pig as a part of a statement containing a relational operator. Pig has a rich variety of ex-
pressions, many of which will be familiar from other programming languages. They are
listed in
Table 16-5
, with brief descriptions and examples. We will see examples of many
of these expressions throughout the chapter.