Database Reference
In-Depth Information
To take a look at the results (found in the HDFS directory /user/hadoop/edgar-results), use the Hadoop file
system ls command:
[hadoop@hc1nn hadoop-1.2.1]$ hadoop fs -ls /user/hadoop/edgar-results
Found 3 items
-rw-r--r-- 1 hadoop supergroup 0 2014-03-16 14:08
/user/hadoop/edgar-results/_SUCCESS
drwxr-xr-x - hadoop supergroup 0 2014-03-16 14:08
/user/hadoop/edgar-results/_logs
-rw-r--r-- 1 hadoop supergroup 769870 2014-03-16 14:08
/user/hadoop/edgar-results/part-r-00000
This shows that the the word-count job has created a file called _SUCCESS to indicate a positive outcome. It has
created a log directory called _logs and a data file called part-r-00000. The last file in the list, the part file, is of the most
interest. You can extract it from HDFS and look at the contents by using the Hadoop file system cat command:
doop@hc1nn hadoop-1.2.1]$ mkdir -p /tmp/hadoop/
doop@hc1nn hadoop-1.2.1]$ hadoop fs -cat
/user/hadoop/edgar-results/part-r-00000 > /tmp/hadoop/part-r-00000
The results reveal that the test job produced a results file containing 67,721 records. You can show this by using
the Linux command wc -l to produce a line count of the results file:
[hadoop@hc1nn hadoop-1.2.1]$ wc -l /tmp/hadoop/part-r-00000
67721 /tmp/hadoop/part-r-00000
By using the Linux head command with a -20 option, you can look at the first 20 lines of the output part file on the
Linux file system:
[hadoop@hc1nn hadoop-1.2.1]$ head -20 /tmp/hadoop/part-r-00000
! 1
" 22
"''T 1
"'-- 1
"'A 1
"'After 1
"'Although 1
"'Among 2
"'And 2
"'Another 1
"'As 2
"'At 1
"'Aussi 1
"'Be 2
"'Being 1
"'But 1
"'But,' 1
"'But--still--monsieur----' 1
"'Catherine, 1
"'Comb 1
Search WWH ::




Custom Search