Database Reference
In-Depth Information
Building the code into a jar library using the jar command creates the wordcount1.jar file:
[hadoop@hc1nn wordcount]$ jar -cvf ./wordcount1.jar -C wc_classes .
added manifest
adding: org/(in = 0) (out= 0)(stored 0%)
adding: org/myorg/(in = 0) (out= 0)(stored 0%)
adding: org/myorg/WordCount.class(in = 1546) (out= 750)(deflated 51%)
adding: org/myorg/WordCount$Reduce.class(in = 1611) (out= 648)(deflated 59%)
adding: org/myorg/WordCount$Map.class(in = 1938) (out= 798)(deflated 58%)
[hadoop@hc1nn wordcount]$ ls -l *.jar
-rw-rw-r--. 1 hadoop hadoop 3169 Jun 15 15:05 wordcount1.jar
This file can now be used to run a word-count task on Hadoop. As in previous Map Reduce runs, the input and
output data for the job will be taken from HDFS. To provide the words to count, I copied some data from Edgar Allan
Poe topics into a directory on HDFS from the Linux file system. The Linux ls command shows the text files that will
be used:
[hadoop@hc1nn wordcount]$ ls $HOME/edgar
10031.txt 15143.txt 17192.txt 2149.txt 932.txt
Copying these files to the HDFS directory called /user/hadoop/edgar, using the Hadoop file system
copyFromLocal command, sets up the data for the word-count job:
[hadoop@hc1nn wordcount]$ hadoop dfs -copyFromLocal $HOME/edgar/* /user/hadoop/edgar
[hadoop@hc1nn wordcount]$ hadoop dfs -ls /user/hadoop/edgar
Found 5 items
-rw-r--r-- 1 hadoop supergroup 410012 2014-06-15 15:53 /user/hadoop/edgar/10031.txt
-rw-r--r-- 1 hadoop supergroup 559352 2014-06-15 15:53 /user/hadoop/edgar/15143.txt
-rw-r--r-- 1 hadoop supergroup 66401 2014-06-15 15:53 /user/hadoop/edgar/17192.txt
-rw-r--r-- 1 hadoop supergroup 596736 2014-06-15 15:53 /user/hadoop/edgar/2149.txt
-rw-r--r-- 1 hadoop supergroup 63278 2014-06-15 15:53 /user/hadoop/edgar/932.txt
By running the word-count example against the data in the input directory (/user/hadoop/edgar), you create the
results data in the output directory (/user/hadoop/edgar-results). First, though, make sure the processes are all up
before you run the job using jps .
[hadoop@hc1nn wordcount]$ jps
1959 SecondaryNameNode
1839 DataNode
4166 TaskTracker
4272 Jps
1720 NameNode
4044 JobTracker
This shows that the HDFS processes for the data node and name node are running on hc1nn. Also, the Map
Reduce processes for the Task and Job Trackers are running. If you are going to rerun this job, then you will need to
delete the HDFS-based results directory by using the Hadoop file system rmr command:
[hadoop@hc1nn wordcount]$ hadoop dfs -rmr /user/hadoop/edgar-results
 
Search WWH ::




Custom Search