Database Reference
In-Depth Information
grunt> REGISTER pig-examples.jar;
Finally, we can invoke the function:
grunt> filtered_records = FILTER records BY temperature != 9999 AND
>>
com.hadoopbook.pig.IsGoodQuality(quality);
Pig resolves function calls by treating the function's name as a Java classname and at-
tempting to load a class of that name. (This, incidentally, is why function names are case
sensitive: because Java classnames are.) When searching for classes, Pig uses a classload-
er that includes the JAR files that have been registered. When running in distributed
mode, Pig will ensure that your JAR files get shipped to the cluster.
For the UDF in this example, Pig looks for a class with the name
com.hadoopbook.pig.IsGoodQuality , which it finds in the JAR file we re-
gistered.
Resolution of built-in functions proceeds in the same way, except for one difference: Pig
has a set of built-in package names that it searches, so the function call does not have to
be a fully qualified name. For example, the function MAX is actually implemented by a
class MAX in the package org.apache.pig.builtin . This is one of the packages
that Pig looks in, so we can write MAX rather than org.apache.pig.builtin.MAX
in our Pig programs.
We can add our package name to the search path by invoking Grunt with this command-
line argument: -Dudf.import.list=com.hadoopbook.pig . Alternatively, we
can shorten the function name by defining an alias, using the DEFINE operator:
grunt> DEFINE isGood com.hadoopbook.pig.IsGoodQuality();
grunt> filtered_records = FILTER records BY temperature != 9999 AND
>>
isGood(quality);
Defining an alias is a good idea if you want to use the function several times in the same
script. It's also necessary if you want to pass arguments to the constructor of the UDF's
implementation class.
TIP
If you add the lines to register JAR files and define function aliases to the .pigbootup file in your home
directory, they will be run whenever you start Pig.
Search WWH ::




Custom Search