Data Research and Advanced Data Cleansing with Pig and Hive - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Figure 9.11 Finding the 10th and 90th percentile

Now that you know how to use UDFs to extend the functionality of Pig, it is

time to take it a step further and create your own UDF.

Building Your Own UDFs for Pig

Unless you are an experienced Java programmer, writing your own UDF

is not trivial, as mentioned earlier. However, if you have experience in

another object-oriented programming language such as C#, you should be

able to transition to writing UDFs in Java without too much difficulty.

One thing you may want to do to make things easier is to download and

install a Java interface development environment (IDE) such as Eclipse

( http://www.eclipse.org/ ) . If you are used to working in Visual Studio, you

should be comfortable developing in Eclipse.

You can create several types of UDFs, depending on the functionality. The

most common type is the eval function. An eval function accepts a tuple

as an input, completes some processing on it, and sends it back out. They

are typically used in conjunction with a FOREACH statement in HiveQL. For

example, the following script calls a custom UDF to convert string values to

lowercase:

Register

C:\hdp\hadoop\pig-0.11.0.1.3.0.0-0380\SampleUDF.jar;

Define lcase com.BigData.hadoop.pig.SampleUDF.Lower;

FlightData = LOAD '/user/test/FlightPerformance.csv'

using PigStorage(',')

as

Search WWH ::

Custom Search

Home