Database Reference
In-Depth Information
Figure 9.9 Splitting day and hour from an ISO date field
Now you can group the data by hour and get the maximum, minimum, and
average speed recorded during each hour (see Figure 9.10 ) :
SpeedDataGrouped = Group SpeedDataHour BY hr;
SpeedDataAgr = FOREACH SpeedDataGrouped
GENERATE group, MAX(SpeedDataHour.speed),
MIN(SpeedDataHour.speed), AVG(SpeedDataHour.speed);
Dump SpeedDataAgr;
Figure 9.10 Speed data aggregated by hour
Using DataFu for Advanced Statistics
Even though Pig contains some rudimentary statistical UDFs you can use
to analyze the data, you often need to implement advanced statistical
techniques to accurately process the data. For example, you might want
to eliminate outliers in your data. To determine the outliers, you can use
the DataFu Quantile function and compute the 10th and 90th percentile
values.
 
 
 
Search WWH ::




Custom Search