Database Reference
In-Depth Information
Pig Latin
This section gives an informal description of the syntax and semantics of the Pig Latin pro-
there should be enough here for you to get a good understanding of Pig Latin's constructs.
Structure
A Pig Latin program consists of a collection of statements. A statement can be thought of
grouped_records =
GROUP
records
BY
year;
The command to list the files in a Hadoop filesystem is another example of a statement:
ls /
Statements are usually terminated with a semicolon, as in the example of the
GROUP
state-
ment. In fact, this is an example of a statement that must be terminated with a semicolon; it
is a syntax error to omit it. The
ls
command, on the other hand, does not have to be ter-
minated with a semicolon. As a general guideline, statements or commands for interactive
use in Grunt do not need the terminating semicolon. This group includes the interactive Ha-
doop commands, as well as the diagnostic operators such as
DESCRIBE
. It's never an error
to add a terminating semicolon, so if in doubt, it's simplest to add one.
Statements that have to be terminated with a semicolon can be split across multiple lines
for readability:
records =
LOAD
'input/ncdc/micro-tab/sample.txt'
AS
(year:
chararray
, temperature:
int
, quality:
int
);
Pig Latin has two forms of comments. Double hyphens are used for single-line comments.
Everything from the first hyphen to the end of the line is ignored by the Pig Latin interpret-
er:
-- My program
DUMP
A;
-- What's in A?
C-style comments are more flexible since they delimit the beginning and end of the com-
ment block with
/*
and
*/
markers. They can span lines or be embedded in a single line:
/*
* Description of my program spanning
* multiple lines.
*/