Database Reference
In-Depth Information
(1949,111,1)
(1949,78,1)
Parameter Substitution
If you have a Pig script that you run on a regular basis, it's quite common to want to be
able to run the same script with different parameters. For example, a script that runs daily
may use the date to determine which input files it runs over. Pig supports parameter sub-
stitution , where parameters in the script are substituted with values supplied at runtime.
Parameters are denoted by identifiers prefixed with a $ character; for example, $input
and $output are used in the following script to specify the input and output paths:
-- max_temp_param.pig
records = LOAD '$input' AS (year: chararray , temperature: int ,
quality: int );
filtered_records = FILTER records BY temperature != 9999 AND
quality IN ( 0 , 1 , 4 , 5 , 9 );
grouped_records = GROUP filtered_records BY year;
max_temp = FOREACH grouped_records GENERATE group ,
MAX (filtered_records.temperature);
STORE max_temp into '$output' ;
Parameters can be specified when launching Pig using the -param option, once for each
parameter:
% pig -param input=/user/tom/input/ncdc/micro-tab/sample.txt \
>
-param output=/tmp/out \
>
ch16-pig/src/main/pig/max_temp_param.pig
You can also put parameters in a file and pass them to Pig using the -param_file op-
tion. For example, we can achieve the same result as the previous command by placing
the parameter definitions in a file:
# Input file
input=/user/tom/input/ncdc/micro-tab/sample.txt
# Output file
output=/tmp/out
The pig invocation then becomes:
% pig -param_file ch16-pig/src/main/pig/max_temp_param.param \
>
ch16-pig/src/main/pig/max_temp_param.pig
You can specify multiple parameter files by using -param_file repeatedly. You can
also use a combination of -param and -param_file options; if any parameter is
Search WWH ::




Custom Search