Database Reference
In-Depth Information
It lists the files in the results directory on HDFS and dumps the last 10 lines of the results part file using the
Hadoop file system
cat
command and the Lunix
tail
command.
The script wordcount.sh runs the Map Reduce task by using the Map and Reduce Perl scripts:
[hadoop@hc1nn perl]$ cat wordcount.sh
01 #!/bin/bash
02
03 # Now run the Perl based word count
04
05 cd $HADOOP_PREFIX
06
07 hadoop jar contrib/streaming/hadoop-*streaming*.jar \
08 -file /home/hadoop/perl/mapper.pl \
09 -mapper /home/hadoop/perl/mapper.pl \
10 -file /home/hadoop/perl/reducer.pl \
11 -reducer /home/hadoop/perl/reducer.pl \
12 -input /user/hadoop/edgar/* \
13 -output /user/hadoop/perl/results_wc
The
\
characters allow you to make your Hadoop command line more readable by breaking a single command
line over multiple lines. The
-file
options make a file executable within Hadoop. The
-mapper
and
-reducer
options
identify the Map and Reduce functions for the job. The
-input
option gives the path on HDFS to the input text data.
The
-output
option specifies where the job output will be placed on HDFS.
The Hadoop
jar
parameter allows the command line to specify which library file to use—in this case, the
streaming library. Using the last three scripts for cleaning, running, and outputting the results makes the Map Reduce
task quickly repeatable; you do not need to retype the commands! The output is a Map Reduce job, as shown below:
[hadoop@hc1nn perl]$ ./wordcount.sh
packageJobJar: [/home/hadoop/perl/mapper.pl, /home/hadoop/perl/reducer.pl, /app/hadoop/tmp/hadoop-
unjar5199336797215175827/] [] /tmp/streamjob5502063820605104626.jar tmpDir=null
14/06/20 13:35:56 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/06/20 13:35:56 INFO mapred.FileInputFormat: Total input paths to process : 5
14/06/20 13:35:57 INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local]
14/06/20 13:35:57 INFO streaming.StreamJob: Running job: job_201406201237_0010
14/06/20 13:35:57 INFO streaming.StreamJob: To kill this job, run:
14/06/20 13:35:57 INFO streaming.StreamJob: /usr/local/hadoop-1.2.1/libexec/../bin/hadoop job
-Dmapred.job.tracker=hc1nn:54311 -kill job_201406201237_0010
14/06/20 13:35:57 INFO streaming.StreamJob: Tracking URL:
http://hc1nn:50030/jobdetails.
14/06/20 13:35:58 INFO streaming.StreamJob: map 0% reduce 0%
14/06/20 13:36:06 INFO streaming.StreamJob: map 20% reduce 0%
14/06/20 13:36:08 INFO streaming.StreamJob: map 60% reduce 0%
14/06/20 13:36:13 INFO streaming.StreamJob: map 100% reduce 0%
14/06/20 13:36:15 INFO streaming.StreamJob: map 100% reduce 33%
14/06/20 13:36:19 INFO streaming.StreamJob: map 100% reduce 100%
14/06/20 13:36:22 INFO streaming.StreamJob: Job complete: job_201406201237_0010
14/06/20 13:36:22 INFO streaming.StreamJob: Output: /user/hadoop/perl/results_wc
Search WWH ::
Custom Search