Database Reference
In-Depth Information
Some common string functions are Lower , Trim , Substring , and
Replace . The following code trims leading and trailing spaces from the
airport codes:
FlightDataTrimmed = FOREACH FlightData
GENERATE TRIM(AirportCode) AS AirportCode2;
Executing User-defined Functions
In the preceding section, you looked at some of the useful built-in functions
available in Pig. Because these are built-in functions, you do not have to
register the functions or use fully qualified naming to invoke them because
Pig knows where the functions reside. It is recommended that you use
the built-in functions if they meet your processing needs. However, these
built-in functions are limited and will not always meet your requirements.
In these cases, you can use user-defined functions (UDFs).
Creating your own functions is not trivial, so you should investigate whether
a publicly available UDF could meet your needs before going to the trouble
of creating your own. Two useful open source libraries containing prebuilt
UDFs are PiggyBank and DataFu, discussed next.
PiggyBank
PiggyBankisarepositoryforUDFsprovidedbytheopensourcecommunity.
Unlike with the built-in UDFs, you need to register the jar to use them. The
jar file contains the compiled code for the function. Once registered, you can
use them in your Pig scripts by providing the function's fully qualified name
or use the define statement to provide an alias for the UDF. The following
code uses the reverse function contained in the piggybank.jar file to
reverse a string. The HCatLoader loads data from a table defined using
HCatalog (covered in Chapter 7, “Expanding Your Capability with HBase
and HCatalog”):
REGISTER piggybank.jar;
define reverse
org.apache.pig.piggybank.evaluation.string.Reverse();
FlightData = LOAD 'FlightData'
USING org.apache.hcatalog.pig.HCatLoader();
Search WWH ::




Custom Search