Database Reference
In-Depth Information
FlightDataReversed = Foreach FlightData
Generate (origin, reverse(origin));
PiggyBankfunctionsareorganizedintopackagesaccordingtofunctiontype.
For example, the org.apache.pig.piggybank.evaluation package
contains functions for custom evaluation operations like aggregates and
column transformations. The functions are further organized into
subgroups by function. The
org.apache.pig.piggybank.evaluation.string functions contain
custom functions for string evaluations such as the reverse seen earlier. In
addition to the evaluation functions, there are functions for comparison,
filtering, grouping, and loading/storing.
DataFu
DataFu was developed by LinkedIn to aid them in analyzing their big data
sets. This is a well-tested set of UDFs containing functions for data mining
and advanced statistics. You can download the jar file from www.wiley.com/
go/microsoftbigdatasolutions . To use the UDFs, you complete the same
process as you do with the PiggyBank library. Register the jar file so that
Pig can locate it and define an alias to use in your script. The following code
finds the median of a set of measures:
REGISTER
'C:\Hadoop\pig-0.9.3-SNAPSHOT\datafu-0.0.10.jar';
DEFINE Median datafu.pig.stats.Median();
TempData = LOAD '/user/test/temperature.txt'
using PigStorage() AS (dtstamp:chararray,
sensorid:int, temp:double);
TempDataGrouped = Group TempData ALL;
MedTemp = ForEach TempDataGrouped
{ TempSorted = ORDER TempData BY temp;
GENERATE Median(TempData.temp);};
Using UDFs
You can set up Hortonworks Data Platform (HDP) for Windows on a
development server to provide a local test environment that supports a
single-node deployment. (For a detailed discussion of installing the Hadoop
Search WWH ::




Custom Search