Database Reference
In-Depth Information
You can now start Pig in interactive Map Reduce mode. Without any options, the pig command will result in the
interactive Grunt command line after trying to access Hadoop:
[hadoop@hc1nn edgar]$ pig
2014-06-18 12:27:10,055 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.1 (r1585011)
compiled Apr 05 2014, 01:41:34
2014-06-18 12:27:10,056 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop/
edgar/pig_1403051230051.log
2014-06-18 12:27:10,095 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/
hadoop/.pigbootup not found
2014-06-18 12:27:10,386 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
- Connecting to hadoop file system at: hdfs://hc1nn:54310
2014-06-18 12:27:10,750 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
- Connecting to map-reduce job tracker at: hc1nn:54311
grunt>
In Pig, the character “--” denotes a comment, meaning text between the -- and the start of the next line is ignored.
The semicolon (;) denotes the end of a Pig native statement. The data file is loaded from HDFS into variable A by using
the load option:
grunt> A = load '/user/hadoop/pig/10031.txt'; -- load the text file
With a single line, you can process each word in the data in variable A into a list of words, place each word from
the list in variable B , then add them to variable C. TOKENIZE splits the data on white-space characters. Here's the
command you need:
grunt> C = foreach A generate flatten(TOKENIZE((chararray)$0)) as B ; -- get list of words
Next, you can group the identical words into a variable D , and create a list of word counts in variable E by using
the count option:
grunt> D = group C by B ; -- group words
grunt> E = foreach D generate COUNT(C), group; -- create word count
To view the word count, you use the dump command to display the contents of variable E in the session window;
this shows the word-count list. (I've listed the last 10 lines here.) As you can see, it's very basic counting:
grunt> dump E; -- dump result to session
.......
(1, http://pglaf.org/fundraising . )
(1,it!--listen--now--listen!--the)
(1, http://www.gutenberg.net/GUTINDEX.ALL )
(1, http://www.gutenberg.net/1/0/2/3/10234 )
(1, http://www.gutenberg.net/2/4/6/8/24689 )
(1, http://www.gutenberg.net/1/0/0/3/10031/ )
(1, http://www.ibiblio.org/gutenberg/etext06 )
(0,)
 
Search WWH ::




Custom Search