Database Reference
In-Depth Information
You can run the job via the Hadoop jar command. The parameters passed to it are the library file you have just
created, the name of the class to run in that library, the input directory on HDFS, and the output directory:
[hadoop@hc1nn wordcount]$ hadoop jar ./wordcount1.jar org.myorg.WordCount /user/hadoop/edgar /user/
hadoop/edgar-results
14/06/15 16:04:50 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/06/15 16:04:50 INFO mapred.FileInputFormat: Total input paths to process : 5
14/06/15 16:04:51 INFO mapred.JobClient: Running job: job_201406151602_0001
14/06/15 16:04:52 INFO mapred.JobClient: map 0% reduce 0%
14/06/15 16:05:02 INFO mapred.JobClient: map 20% reduce 0%
14/06/15 16:05:03 INFO mapred.JobClient: map 40% reduce 0%
14/06/15 16:05:04 INFO mapred.JobClient: map 60% reduce 0%
........................
14/06/15 16:05:19 INFO mapred.JobClient: Combine input records=284829
14/06/15 16:05:19 INFO mapred.JobClient: Reduce input records=55496
14/06/15 16:05:19 INFO mapred.JobClient: Reduce input groups=36348
14/06/15 16:05:19 INFO mapred.JobClient: Combine output records=55496
14/06/15 16:05:19 INFO mapred.JobClient: Physical memory (bytes) snapshot=912035840
14/06/15 16:05:19 INFO mapred.JobClient: Reduce output records=36348
14/06/15 16:05:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=7949012992
14/06/15 16:05:19 INFO mapred.JobClient: Map output records=284829
The job has completed (the output shown above has been trimmed), so you can check the output on HDFS
under /user/hadoop/edgar-results/ by using the Hadoop file system ls command:
[hadoop@hc1nn wordcount]$ hadoop dfs -ls /user/hadoop/edgar-results/
Found 3 items
-rw-r--r-- 1 hadoop supergroup 0 2014-06-15 16:05 /user/hadoop/edgar-results/_SUCCESS
drwxr-xr-x - hadoop supergroup 0 2014-06-15 16:04 /user/hadoop/edgar-results/_logs
-rw-r--r-- 1 hadoop supergroup 396500 2014-06-15 16:05 /user/hadoop/edgar-results/part-00000
These results show a _SUCCESS file, so the job was completed without error. As in previous examples, you use
the Hadoop file system cat command to dump the contents of the results file and the Linux head command to limit
the job results to the first 10 rows:
[hadoop@hc1nn wordcount]$ hadoop dfs -cat /user/hadoop/edgar-results/part-00000 | head -10
!) 1
"''T 1
"'And 1
"'As 1
"'Be 2
"'But--still--monsieur----' 1
"'Catherine, 1
"'Comb 1
"'Come 1
"'Eyes,' 1
Well done! You have just compiled and run your own native Map Reduce job from a source file. To create more,
you can simply change the algorithm in Java (or write your own) and follow the same process. One change that might
be useful is to ignore the white-space and symbol characters when counting the words. The example's output data
contains characters like these (“ or -). The next example adds these refinements.
 
Search WWH ::




Custom Search