Database Reference
In-Depth Information
Precisely which Hadoop filesystem is used is determined by the fs.defaultFS prop-
erty in the site file for Hadoop Core. See The Command-Line Interface for more details
on how to configure this property.
These commands are mostly self-explanatory, except set , which is used to set options
that control Pig's behavior (including arbitrary MapReduce job properties). The debug
option is used to turn debug logging on or off from within a script (you can also control
the log level when launching Pig, using the -d or -debug option):
grunt> set debug on
Another useful option is the job.name option, which gives a Pig job a meaningful
name, making it easier to pick out your Pig MapReduce jobs when running on a shared
Hadoop cluster. If Pig is running a script (rather than operating as an interactive query
from Grunt), its job name defaults to a value based on the script name.
There are two commands in Table 16-4 for running a Pig script, exec and run . The dif-
ference is that exec runs the script in batch mode in a new Grunt shell, so any aliases
defined in the script are not accessible to the shell after the script has completed. On the
other hand, when running a script with run , it is as if the contents of the script had been
entered manually, so the command history of the invoking shell contains all the statements
from the script. Multiquery execution, where Pig executes a batch of statements in one go
(see Multiquery Execution ) , is used only by exec , not run .
CONTROL FLOW
By design, Pig Latin lacks native control flow statements. The recommended approach for writing pro-
grams that have conditional logic or loop constructs is to embed Pig Latin in another language, such as
Python, JavaScript, or Java, and manage the control flow from there. In this model, the host script uses a
compile-bind-run API to execute Pig scripts and retrieve their status. Consult the Pig documentation for
details of the API.
Embedded Pig programs always run in a JVM, so for Python and JavaScript you use the pig command
followed by the name of your script, and the appropriate Java scripting engine will be selected (Jython
for Python, Rhino for JavaScript).
Expressions
An expression is something that is evaluated to yield a value. Expressions can be used in
Pig as a part of a statement containing a relational operator. Pig has a rich variety of ex-
pressions, many of which will be familiar from other programming languages. They are
listed in Table 16-5 , with brief descriptions and examples. We will see examples of many
of these expressions throughout the chapter.
Search WWH ::




Custom Search