Database Reference
In-Depth Information
11 gwords = group words by word ;
12
13 -- create a word count
14
15 wcount = foreach gwords generate group, COUNT(words) ;
16
17 -- store the word count
18
19 store wcount into '/user/hadoop/pig/wc_result1' ;
I also added comments, line numbers, and meaningful names for the variables. These modifications can help
when you're trying to determine what a script is doing. They also help to tie this example to the work in the next
section, on Pig user-defined functions.
Instead of invoking the interactive Grunt command line, you invoke Pig with the name of the file containing
the Pig script. Pig will use Map Reduce mode by default and so access HDFS. The output will be stored in the HDFS
directory /user/hadoop/pig/wc_result1/. So, when the task starts, a Map Reduce job is initiated.
[hadoop@hc1nn pig]$ pig wordcount.pig
...................
Counters:
Total records written : 13219
Total bytes written : 137870
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201406181226_0003
2014-06-18 13:27:49,446 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapreducelayer.
mapreducelauncher - Success!
As mentioned previously, you can use Hadoop and Linux commands to output the word count:
[hadoop@hc1nn pig]$ hadoop dfs -cat /user/hadoop/pig/wc_result1/part-r-00000 | tail -10
1 http://gutenberg.net/license
1 Dream'--Prospero--Oberon--and
1 http://pglaf.org/fundraising .
1 it!--listen--now--listen!--the
1 http://www.gutenberg.net/GUTINDEX.ALL
1 http://www.gutenberg.net/1/0/2/3/10234
1 http://www.gutenberg.net/2/4/6/8/24689
1 http://www.gutenberg.net/1/0/0/3/10031/
1 http://www.ibiblio.org/gutenberg/etext06
Notice that in both the interactive and batch script versions, the count includes non-alpha-numeric chararacters
like “:”(colon), and that the case of the words has not been standardized. For instance, the word Dream (with a capital D)
is part of the count. In the next section, you will learn how to add greater selectivity to Pig by creating user-defined
functions.
 
Search WWH ::




Custom Search