Database Reference
In-Depth Information
records = LOAD 'input/ncdc/micro-tab/sample.txt'
AS (year: chararray , temperature: int , quality: int );
filtered_records = FILTER records BY temperature != 9999 AND
quality IN ( 0 , 1 , 4 , 5 , 9 );
max_temp = max_by_group (filtered_records, year, temperature);
DUMP max_temp
At runtime, Pig will expand the macro using the macro definition. After expansion, the
program looks like the following, with the expanded section in bold:
records = LOAD 'input/ncdc/micro-tab/sample.txt'
AS (year: chararray , temperature: int , quality: int );
filtered_records = FILTER records BY temperature != 9999 AND
quality IN ( 0 , 1 , 4 , 5 , 9 );
macro_max_by_group_A_0 = GROUP filtered_records by (year);
max_temp = FOREACH macro_max_by_group_A_0 GENERATE group ,
MAX (filtered_records.(temperature));
DUMP max_temp
Normally you don't see the expanded form, because Pig creates it internally; however, in
some cases it is useful to see it when writing and debugging macros. You can get Pig to
perform macro expansion only (without executing the script) by passing the -dryrun ar-
gument to pig .
Notice that the parameters that were passed to the macro ( filtered_records , year ,
and temperature ) have been substituted for the names in the macro definition. Aliases
in the macro definition that don't have a $ prefix, such as A in this example, are local to
the macro definition and are rewritten at expansion time to avoid conflicts with aliases in
other parts of the program. In this case, A becomes macro_max_by_group_A_0 in the
expanded form.
To foster reuse, macros can be defined in separate files to Pig scripts, in which case they
need to be imported into any script that uses them. An import statement looks like this:
IMPORT './ch16-pig/src/main/pig/max_temp.macro' ;
Search WWH ::




Custom Search