Database Reference
In-Depth Information
2013 -12-10 01:48:11,279 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
- Connecting to map-reduce job tracker at: jobtrackerhost:9010
grunt>
Let's execute a series of Pig statements to parse the Sample.log file that is present in the / example/data/ folder by
default in WASB containers. The first statement loads the file content to a Pig variable called LOGS :
LOGS = LOAD 'wasb:///example/data/sample.log';
Then we will create a variable LEVELS that will categorize the entries in the LOGS variable based on Info, Error,
Warnings, and so forth. For example:
LEVELS = foreach LOGS generate REGEX_EXTRACT($0,'(TRACE|DEBUG|INFO|WARN|ERROR|FATAL)'
, 1) as LOGLEVEL;
Next, we can filter out the null entries in the FILTEREDLEVEL variables:
FILTEREDLEVELS = FILTER LEVELS by LOGLEVEL is not null;
After that, we can filter the group entries based on the values in the variable GROUPEDLEVELS:
GROUPEDLEVELS = GROUP FILTEREDLEVELS by LOGLEVEL;
Next, we count the number of occurrences of each entry type and load them in the FREQUENCIES variable.
For example:
FREQUENCIES = foreach GROUPEDLEVELS generate group as LOGLEVEL,
COUNT(FILTEREDLEVELS.LOGLEVEL) as COUNT;
Then we arrange the grouped entries in descending order of their number of occurrences in the RESULTS variable.
Here's how to sort in that order:
RESULT = order FREQUENCIES by COUNT desc;
Finally, we can print out the value of the RESULTS variable using the DUMP command. Note that this is the place
where the actual MapReduce job is triggered to process and fetch the data. Here's the command:
DUMP RESULT;
On successful execution of Pig statements, you should see output where the log entries are grouped by their
values and arranged based on their number of occurrences. Such output is shown in Listing 6-13.
Listing 6-13. The Pig job output
Input(s):
Successfully read 1387 records (404 bytes) from: "wasb:///example/data/sample.log"
Output(s):
Successfully stored 6 records in: "wasb://democlustercontainer@democluster.blob.
core.windows.net/tmp/temp167788958/tmp-1711466614"
Search WWH ::




Custom Search