Database Reference
In-Depth Information
Lastly, you can store the contents of the count, currently in variable E , on HDFS in /user/hadoop/pig/wc_result:
grunt> store E into '/user/hadoop/pig/wc_result' ; -- store the results
grunt> quit ; -- quit interactive session
Having quit the Pig interactive session, you can examine the results of this Pig job on HDFS. The Hadoop file
system ls command shows a success file (_SUCCESS), a part file (part-r-00000) containing the word-count data, and a
logs directory. (I have listed the part file from the word count using the Hadoop file system command cat .) Then, you
can use the Linux tail command to view the last 10 lines of the file. Both options are shown here:
[hadoop@hc1nn edgar]$ hadoop dfs -ls /user/hadoop/pig/wc_result
Found 3 items
-rw-r--r-- 1 hadoop supergroup 0 2014-06-18 13:08 /user/hadoop/pig/wc_result/_SUCCESS
drwxr-xr-x - hadoop supergroup 0 2014-06-18 13:08 /user/hadoop/pig/wc_result/_logs
-rw-r--r-- 1 hadoop supergroup 137870 2014-06-18 13:08 /user/hadoop/pig/wc_result/part-r-00000
[hadoop@hc1nn edgar]$ hadoop dfs -cat /user/hadoop/pig/wc_result/part-r-00000 | tail -10
1 http://gutenberg.net/license
1 Dream'--Prospero--Oberon--and
1 http://pglaf.org/fundraising .
1 it!--listen--now--listen!--the
1 http://www.gutenberg.net/GUTINDEX.ALL
1 http://www.gutenberg.net/1/0/2/3/10234
1 http://www.gutenberg.net/2/4/6/8/24689
1 http://www.gutenberg.net/1/0/0/3/10031/
1 http://www.ibiblio.org/gutenberg/etext06
It is quite impressive that, with five lines of Pig commands (ignoring the dump and quit lines), you can run the
same word-count algorithm as took 70 lines of Java code. Less code means lower development costs and, we all hope,
fewer code-based errors.
While efficient, the interactive Pig example does have a drawback: The commands must be manually typed each
time you want to run a word count. Once you're finished, they're lost. The answer to this problem, of course, is to store
the Pig script in a file and run it as a batch Map Reduce job. To demonstrate, I placed the Pig commands from the
previous example into the wordcount.pig file:
[hadoop@hc1nn pig]$ ls -l
total 4
-rw-rw-r--. 1 hadoop hadoop 313 Jun 18 13:24 wordcount.pig
[hadoop@hc1nn pig]$ cat wordcount.pig
01 -- get raw line data from file
02
03 rlines = load '/user/hadoop/pig/10031.txt';
04
05 -- get list of words
06
07 words = foreach rlines generate flatten(TOKENIZE((chararray)$0)) as word;
08
09 -- group the words by word value
10
 
Search WWH ::




Custom Search