Database Reference
In-Depth Information
Pig Latin
This section gives an informal description of the syntax and semantics of the Pig Latin pro-
gramming language. [ 98 ] It is not meant to offer a complete reference to the language, [ 99 ] but
there should be enough here for you to get a good understanding of Pig Latin's constructs.
Structure
A Pig Latin program consists of a collection of statements. A statement can be thought of
as an operation or a command. [ 100 ] For example, a GROUP operation is a type of statement:
grouped_records = GROUP records BY year;
The command to list the files in a Hadoop filesystem is another example of a statement:
ls /
Statements are usually terminated with a semicolon, as in the example of the GROUP state-
ment. In fact, this is an example of a statement that must be terminated with a semicolon; it
is a syntax error to omit it. The ls command, on the other hand, does not have to be ter-
minated with a semicolon. As a general guideline, statements or commands for interactive
use in Grunt do not need the terminating semicolon. This group includes the interactive Ha-
doop commands, as well as the diagnostic operators such as DESCRIBE . It's never an error
to add a terminating semicolon, so if in doubt, it's simplest to add one.
Statements that have to be terminated with a semicolon can be split across multiple lines
for readability:
records = LOAD 'input/ncdc/micro-tab/sample.txt'
AS (year: chararray , temperature: int , quality: int );
Pig Latin has two forms of comments. Double hyphens are used for single-line comments.
Everything from the first hyphen to the end of the line is ignored by the Pig Latin interpret-
er:
-- My program
DUMP A; -- What's in A?
C-style comments are more flexible since they delimit the beginning and end of the com-
ment block with /* and */ markers. They can span lines or be embedded in a single line:
/*
* Description of my program spanning
* multiple lines.
*/
Search WWH ::




Custom Search