Configuring Your First Big Data Environment - Microsoft Big Data Solutions

Database Reference

In-Depth Information

ordered = ORDER counts BY \$0;

STORE ordered INTO 'output/pigout' USING PigStorage;

The final line will kick off a MapReduce job and store the output in the

pigout folder.

6. Run the following command to read the output of the Pig job:

hadoop fs -cat output/pigout/part-r-00000

The output is a list of every word in the document and the count of the times

it was used. You will get more exposure to Pig in later chapters, but this

should give you a good idea of the procedural methods used by Pig to move

data through a pipeline and transform it to a useful final state. Of course,

you could have written the preceding code as a Pig script and saved the text

file and simply called it from Pig. You will learn how to do this in Chapter 8.

Summary

Inthischapter,youlearnedhowtoinstallHortonworksDataPlatformintoa

single-node cluster, how to configure HDInsight Service in Windows Azure,

and how to use the tools available in each to quickly validate their installs.

You were also introduced to moving data into HDFS and the tools available

tomovedataintoyourAzure storage. Finally, thischaptergaveyouaprimer

in Hive and Pig so that you can quickly evaluate that they are running as

expected in your environment.

Search WWH ::

Custom Search

Home