Database Reference
In-Depth Information
To use the DataFu UDFs, download the datafu.jar file from
www.wiley.com/go/microsoftbigdatasolutions and place it in the same
directory as the piggybank.jar file. You can now reference the jar file
in your script. Define an alias for the Quantile function and provide the
quantile values you want to calculate:
REGISTER
'C:\hdp\hadoop\pig-0.11.0.1.3.0.0-0380\datafu-0.0.10.jar';
DEFINE Quantile datafu.pig.stats.Quantile('.10','.90');
Load and group the data:
SpeedData = LOAD '/user/test/traffic.txt' using
PigStorage()
AS (dtstamp:chararray, sensorid:int, speed:double);
SpeedDataGrouped = Group SpeedData ALL;
Pass sorted data to the Quantile function and dump the results out to the
command-line console (see Figure 9.11 ) . Using this data, you can then write
a script to filter out the outliers:
QuantSpeeds = ForEach SpeedDataGrouped
{ SpeedSorted = ORDER SpeedData BY speed;
GENERATE Quantile(SpeedData.speed);};
Dump QuantSpeeds;
 
Search WWH ::




Custom Search