Database Reference
In-Depth Information
Figure 9.11 Finding the 10th and 90th percentile
Now that you know how to use UDFs to extend the functionality of Pig, it is
time to take it a step further and create your own UDF.
Building Your Own UDFs for Pig
Unless you are an experienced Java programmer, writing your own UDF
is not trivial, as mentioned earlier. However, if you have experience in
another object-oriented programming language such as C#, you should be
able to transition to writing UDFs in Java without too much difficulty.
One thing you may want to do to make things easier is to download and
install a Java interface development environment (IDE) such as Eclipse
( http://www.eclipse.org/ ) . If you are used to working in Visual Studio, you
should be comfortable developing in Eclipse.
You can create several types of UDFs, depending on the functionality. The
most common type is the eval function. An eval function accepts a tuple
as an input, completes some processing on it, and sends it back out. They
are typically used in conjunction with a FOREACH statement in HiveQL. For
example, the following script calls a custom UDF to convert string values to
lowercase:
Register
C:\hdp\hadoop\pig-0.11.0.1.3.0.0-0380\SampleUDF.jar;
Define lcase com.BigData.hadoop.pig.SampleUDF.Lower;
FlightData = LOAD '/user/test/FlightPerformance.csv'
using PigStorage(',')
as
 
Search WWH ::




Custom Search