Database Reference
In-Depth Information
ordered = ORDER counts BY \$0;
STORE ordered INTO 'output/pigout' USING PigStorage;
The final line will kick off a MapReduce job and store the output in the
pigout folder.
6. Run the following command to read the output of the Pig job:
hadoop fs -cat output/pigout/part-r-00000
The output is a list of every word in the document and the count of the times
it was used. You will get more exposure to Pig in later chapters, but this
should give you a good idea of the procedural methods used by Pig to move
data through a pipeline and transform it to a useful final state. Of course,
you could have written the preceding code as a Pig script and saved the text
file and simply called it from Pig. You will learn how to do this in Chapter 8.
Summary
Inthischapter,youlearnedhowtoinstallHortonworksDataPlatformintoa
single-node cluster, how to configure HDInsight Service in Windows Azure,
and how to use the tools available in each to quickly validate their installs.
You were also introduced to moving data into HDFS and the tools available
tomovedataintoyourAzure storage. Finally, thischaptergaveyouaprimer
in Hive and Pig so that you can quickly evaluate that they are running as
expected in your environment.
Search WWH ::




Custom Search