Database Reference
In-Depth Information
(1949,111,1)
(1949,78,1)
Parameter Substitution
If you have a Pig script that you run on a regular basis, it's quite common to want to be
able to run the same script with different parameters. For example, a script that runs daily
may use the date to determine which input files it runs over. Pig supports
parameter sub-
stitution
, where parameters in the script are substituted with values supplied at runtime.
Parameters are denoted by identifiers prefixed with a
$
character; for example,
$input
and
$output
are used in the following script to specify the input and output paths:
-- max_temp_param.pig
records =
LOAD
'$input'
AS
(year:
chararray
, temperature:
int
,
quality:
int
);
filtered_records =
FILTER
records
BY
temperature !=
9999
AND
quality
IN
(
0
,
1
,
4
,
5
,
9
);
grouped_records =
GROUP
filtered_records
BY
year;
max_temp =
FOREACH
grouped_records
GENERATE group
,
MAX
(filtered_records.temperature);
STORE
max_temp
into
'$output'
;
Parameters can be specified when launching Pig using the
-param
option, once for each
parameter:
%
pig -param input=/user/tom/input/ncdc/micro-tab/sample.txt \
>
-param output=/tmp/out \
>
ch16-pig/src/main/pig/max_temp_param.pig
You can also put parameters in a file and pass them to Pig using the
-param_file
op-
tion. For example, we can achieve the same result as the previous command by placing
the parameter definitions in a file:
# Input file
input=/user/tom/input/ncdc/micro-tab/sample.txt
# Output file
output=/tmp/out
The
pig
invocation then becomes:
%
pig -param_file ch16-pig/src/main/pig/max_temp_param.param \
>
ch16-pig/src/main/pig/max_temp_param.pig
You can specify multiple parameter files by using
-param_file
repeatedly. You can
also use a combination of
-param
and
-param_file
options; if any parameter is