Database Reference
In-Depth Information
Running Pig Interactively with Grunt
Fromthe
bin
folderofthePiginstallfolder(
hdp\hadoop\pig\bin
),open
the Pig command-line console to launch the Grunt shell. The Grunt shell
enables you to run Pig Latin interactively and view the results of each step.
Enter the following script to load and create a schema for the traffic data:
SpeedData = LOAD '/user/test/traffic.txt'
using PigStorage() AS (dtstamp:chararray,
sensorid:int, speed:double);
Dump the results to the screen:
DUMP SpeedData;
By doing so, you can run a map-reduce job that outputs the data to the
console window. You should see data similar to
Figure 9.7
, which shows the
tuples that make the set of data.
Figure 9.7
Dumping results to the console window
Using PiggyBank to Extract Time Periods
The next step in analyzing the data is to group it into different date/time
buckets. To accomplish this, you use functions defined in the
piggybank.jar
file. If that file is not already installed, you can either
download and compile the source code or download a compiled jar file
from
www.wiley.com/go/microsoftbigdatasolutions
.
Along with the
piggybank.jar
file, you need to get a copy of the
joda-time-2.2.jar