Database Reference
In-Depth Information
Now you are ready to run this extended version of the Java Map Reduce task. The library that was just created is
specified via the Hadoop jar option. This is followed by the Class name to be called within that library. Next, a flag
is set via the -D option to switch the case sensitivity off. After that, the input data file and output directory names on
HDFS are listed. Finally, you specify a skip file to remove any unwanted characters in the data processed:
[hadoop@hc1nn wordcount]$ hadoop jar ./wordcount1.jar org.myorg.WordCount
-Dwordcount.case.sensitive=false /user/hadoop/edgar/10031.txt
/user/hadoop/edgar-results -skip /user/hadoop/java/patterns.txt
The command produces the following Map Reduce task output:
14/06/21 17:40:06 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/06/21 17:40:06 INFO mapred.FileInputFormat: Total input paths to process : 1
14/06/21 17:40:07 INFO mapred.JobClient: Running job: job_201406211041_0004
14/06/21 17:40:08 INFO mapred.JobClient: map 0% reduce 0%
14/06/21 17:40:15 INFO mapred.JobClient: map 50% reduce 0%
14/06/21 17:40:23 INFO mapred.JobClient: map 100% reduce 16%
14/06/21 17:40:30 INFO mapred.JobClient: map 100% reduce 100%
14/06/21 17:40:31 INFO mapred.JobClient: Job complete: job_201406211041_0004
14/06/21 17:40:31 INFO mapred.JobClient: Counters: 32
14/06/21 17:40:31 INFO mapred.JobClient: Job Counters
14/06/21 17:40:31 INFO mapred.JobClient: Launched reduce tasks=1
14/06/21 17:40:31 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=17198
14/06/21 17:40:31 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving
slots (ms)=0
............................
14/06/21 17:40:31 INFO mapred.JobClient: CPU time spent (ms)=5880
14/06/21 17:40:31 INFO mapred.JobClient: Map input bytes=410012
14/06/21 17:40:31 INFO mapred.JobClient: SPLIT_RAW_BYTES=198
14/06/21 17:40:31 INFO mapred.JobClient: Combine input records=63590
14/06/21 17:40:31 INFO mapred.JobClient: Reduce input records=12581
14/06/21 17:40:31 INFO mapred.JobClient: Reduce input groups=9941
14/06/21 17:40:31 INFO mapred.JobClient: Combine output records=12581
14/06/21 17:40:31 INFO mapred.JobClient: Physical memory (bytes) snapshot=404115456
14/06/21 17:40:31 INFO mapred.JobClient: Reduce output records=9941
14/06/21 17:40:31 INFO mapred.JobClient: Virtual memory (bytes) snapshot=4109373440
14/06/21 17:40:31 INFO mapred.JobClient: Map output records=63590
Check the results directory on HDFS by using the Hadoop file system ls command. The existence of a _SUCCESS
file shows that the job was a success:
[hadoop@hc1nn wordcount]$ hadoop dfs -ls /user/hadoop/edgar-results
Found 3 items
-rw-r--r-- 1 hadoop supergroup 0 2014-06-21 17:40 /user/hadoop/edgar-results/_SUCCESS
drwxr-xr-x - hadoop supergroup 0 2014-06-21 17:40 /user/hadoop/edgar-results/_logs
-rw-r--r-- 1 hadoop supergroup 103300 2014-06-21 17:40 /user/hadoop/edgar-results/part-00000
 
Search WWH ::




Custom Search