Data Research and Advanced Data Cleansing with Pig and Hive - Microsoft Big Data Solutions

Database Reference

In-Depth Information

Running Pig Interactively with Grunt

Fromthe bin folderofthePiginstallfolder( hdp\hadoop\pig\bin ),open

the Pig command-line console to launch the Grunt shell. The Grunt shell

enables you to run Pig Latin interactively and view the results of each step.

Enter the following script to load and create a schema for the traffic data:

SpeedData = LOAD '/user/test/traffic.txt'

using PigStorage() AS (dtstamp:chararray,

sensorid:int, speed:double);

Dump the results to the screen:

DUMP SpeedData;

By doing so, you can run a map-reduce job that outputs the data to the

console window. You should see data similar to Figure 9.7 , which shows the

tuples that make the set of data.

Figure 9.7 Dumping results to the console window

Using PiggyBank to Extract Time Periods

The next step in analyzing the data is to group it into different date/time

buckets. To accomplish this, you use functions defined in the

piggybank.jar file. If that file is not already installed, you can either

download and compile the source code or download a compiled jar file

from www.wiley.com/go/microsoftbigdatasolutions . Along with the

piggybank.jar file, you need to get a copy of the joda-time-2.2.jar

Search WWH ::

Custom Search

Home