Database Reference
In-Depth Information
Okay, that works. The input of five words separated by spaces is outputted as five key-value pairs of the words
with a value of 1. Now, you test the Reduce function with test2.sh:
[hadoop@hc1nn perl]$ cat test2.sh
01 #!/bin/bash
02
03 # test the mapper
04
05 echo "one one one two three" | ./mapper.pl | ./reducer.pl
This script pipes the output from the Map function shown above into the Reduce function:
[hadoop@hc1nn perl]$ ./test2.sh
one,3
two,1
three,1
The Reduce function sums the values of the similar words correctly: three instances of the word one followed by
one each of two and three . Now, it is time to run the Hadoop streaming Map Reduce job by using these Perl scripts.
You create three scripts to help with this:
[hadoop@hc1nn perl]$ ls w*
wc_clean.sh wc_output.sh wordcount.sh
The script wc_clean.sh is used to delete the contents of the results directory on HDFS so that the Map Reduce job
can be rerun:
[hadoop@hc1nn perl]$ cat wc_clean.sh
01 #!/bin/bash
02
03 # Clean the hadoop perl run data directory
04
05 hadoop dfs -rmr /user/hadoop/perl/results_wc
This uses the Hadoop file system rmr command to delete the directory and its contents.
The script wc_output.sh is used to display the results of the job:
[hadoop@hc1nn perl]$ cat wc_output.sh
01 #!/bin/bash
02
03 # List the results directory
04
05 hadoop dfs -ls /user/hadoop/perl/results_wc
06
07 # Cat the last ten lines of the part file
08
09 hadoop dfs -cat /user/hadoop/perl/results_wc/part-00000 | tail -10
 
Search WWH ::




Custom Search