Database Reference
In-Depth Information
28 if ( $word eq $oldword )
29 {
30 $sumval += $value ;
31 }
32 else
33 {
34 if ( $oldword ne "" )
35 {
36 print "$oldword,$sumval\n" ;
37 }
38 $sumval = 1 ;
39 }
40
41 # now print the name value pairs
42
43 $oldword = $word ;
44 }
45
46 # remember to print last word
47
48 print "$oldword,$sumval\n" ;
The reducer.pl Perl script that receives data from the mapper.pl script splits its STDIN (standard input) line into
the key-value pair of word,1 (at line 21). It then groups similar words and increments their count between lines 28 and
39. Lastly, it outputs key-value pairs as word,count at lines 36 and 48.
You already have some basic text files on HDFS under the directory /user/hadoop/edgar on which you can run the
Perl word-count example. Check the data using the Hadoop file system ls command to be sure that it is ready to use:
[hadoop@hc1nn python]$ hadoop dfs -ls /user/hadoop/edgar
Found 5 items
-rw-r--r-- 1 hadoop supergroup 410012 2014-06-15 15:53 /user/hadoop/edgar/10031.txt
-rw-r--r-- 1 hadoop supergroup 559352 2014-06-15 15:53 /user/hadoop/edgar/15143.txt
-rw-r--r-- 1 hadoop supergroup 66401 2014-06-15 15:53 /user/hadoop/edgar/17192.txt
-rw-r--r-- 1 hadoop supergroup 596736 2014-06-15 15:53 /user/hadoop/edgar/2149.txt
-rw-r--r-- 1 hadoop supergroup 63278 2014-06-15 15:53 /user/hadoop/edgar/932.txt
The test1.sh shell script tests the Map function on the Linux command line to ensure that it works, giving a single
word count—that is, a count of 1 for each word in the string:
[hadoop@hc1nn perl]$ cat test1.sh
01 #!/bin/bash
02
03 # test the mapper
04
05 echo "one one one two three" | ./mapper.pl
[hadoop@hc1nn perl]$ ./test1.sh
one,1
one,1
one,1
two,1
three,1
 
Search WWH ::




Custom Search