Database Reference
In-Depth Information
It lists the files in the results directory on HDFS and dumps the last 10 lines of the results part file using the
Hadoop file system cat command and the Lunix tail command.
The script wordcount.sh runs the Map Reduce task by using the Map and Reduce Perl scripts:
[hadoop@hc1nn perl]$ cat wordcount.sh
01 #!/bin/bash
02
03 # Now run the Perl based word count
04
05 cd $HADOOP_PREFIX
06
07 hadoop jar contrib/streaming/hadoop-*streaming*.jar \
08 -file /home/hadoop/perl/mapper.pl \
09 -mapper /home/hadoop/perl/mapper.pl \
10 -file /home/hadoop/perl/reducer.pl \
11 -reducer /home/hadoop/perl/reducer.pl \
12 -input /user/hadoop/edgar/* \
13 -output /user/hadoop/perl/results_wc
The \ characters allow you to make your Hadoop command line more readable by breaking a single command
line over multiple lines. The -file options make a file executable within Hadoop. The -mapper and -reducer options
identify the Map and Reduce functions for the job. The -input option gives the path on HDFS to the input text data.
The -output option specifies where the job output will be placed on HDFS.
The Hadoop jar parameter allows the command line to specify which library file to use—in this case, the
streaming library. Using the last three scripts for cleaning, running, and outputting the results makes the Map Reduce
task quickly repeatable; you do not need to retype the commands! The output is a Map Reduce job, as shown below:
[hadoop@hc1nn perl]$ ./wordcount.sh
packageJobJar: [/home/hadoop/perl/mapper.pl, /home/hadoop/perl/reducer.pl, /app/hadoop/tmp/hadoop-
unjar5199336797215175827/] [] /tmp/streamjob5502063820605104626.jar tmpDir=null
14/06/20 13:35:56 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/06/20 13:35:56 INFO mapred.FileInputFormat: Total input paths to process : 5
14/06/20 13:35:57 INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local]
14/06/20 13:35:57 INFO streaming.StreamJob: Running job: job_201406201237_0010
14/06/20 13:35:57 INFO streaming.StreamJob: To kill this job, run:
14/06/20 13:35:57 INFO streaming.StreamJob: /usr/local/hadoop-1.2.1/libexec/../bin/hadoop job
-Dmapred.job.tracker=hc1nn:54311 -kill job_201406201237_0010
14/06/20 13:35:57 INFO streaming.StreamJob: Tracking URL: http://hc1nn:50030/jobdetails.
jsp?jobid=job_201406201237_0010
14/06/20 13:35:58 INFO streaming.StreamJob: map 0% reduce 0%
14/06/20 13:36:06 INFO streaming.StreamJob: map 20% reduce 0%
14/06/20 13:36:08 INFO streaming.StreamJob: map 60% reduce 0%
14/06/20 13:36:13 INFO streaming.StreamJob: map 100% reduce 0%
14/06/20 13:36:15 INFO streaming.StreamJob: map 100% reduce 33%
14/06/20 13:36:19 INFO streaming.StreamJob: map 100% reduce 100%
14/06/20 13:36:22 INFO streaming.StreamJob: Job complete: job_201406201237_0010
14/06/20 13:36:22 INFO streaming.StreamJob: Output: /user/hadoop/perl/results_wc
 
Search WWH ::




Custom Search