Data Research and Advanced Data Cleansing with Pig and Hive - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Figure 9.19 Mapping output

The output from the map script is fed into the reduce script, which counts

the occurrence of each log level and returns the total count for each log level

on a new line. Figure 9.20 shows the code for the reduce script.

Figure 9.20 Reduce script to aggregate log level counts

You combine the map and reduce script into your HiveQL where the output

from the mapper is the input for the reducer. The cluster by statement

is used to partition and sort the output of the mapping by the loglevel

key. The following code processes the log files through the custom map and

reduce scripts:

add file c:\sampledata\map_loglevel.py;

add file c:\sampledata\level_cnt.py;

Search WWH ::

Custom Search

Home