Storing and Configuring Data with Hadoop, YARN, and ZooKeeper - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

To take a look at the results (found in the HDFS directory /user/hadoop/edgar-results), use the Hadoop file

system ls command:

[hadoop@hc1nn hadoop-1.2.1]$ hadoop fs -ls /user/hadoop/edgar-results

Found 3 items

-rw-r--r-- 1 hadoop supergroup 0 2014-03-16 14:08

/user/hadoop/edgar-results/_SUCCESS

drwxr-xr-x - hadoop supergroup 0 2014-03-16 14:08

/user/hadoop/edgar-results/_logs

-rw-r--r-- 1 hadoop supergroup 769870 2014-03-16 14:08

/user/hadoop/edgar-results/part-r-00000

This shows that the the word-count job has created a file called _SUCCESS to indicate a positive outcome. It has

created a log directory called _logs and a data file called part-r-00000. The last file in the list, the part file, is of the most

interest. You can extract it from HDFS and look at the contents by using the Hadoop file system cat command:

doop@hc1nn hadoop-1.2.1]$ mkdir -p /tmp/hadoop/

doop@hc1nn hadoop-1.2.1]$ hadoop fs -cat

/user/hadoop/edgar-results/part-r-00000 > /tmp/hadoop/part-r-00000

The results reveal that the test job produced a results file containing 67,721 records. You can show this by using

the Linux command wc -l to produce a line count of the results file:

[hadoop@hc1nn hadoop-1.2.1]$ wc -l /tmp/hadoop/part-r-00000

67721 /tmp/hadoop/part-r-00000

By using the Linux head command with a -20 option, you can look at the first 20 lines of the output part file on the

Linux file system:

[hadoop@hc1nn hadoop-1.2.1]$ head -20 /tmp/hadoop/part-r-00000

! 1

" 22

"''T 1

"'-- 1

"'A 1

"'After 1

"'Although 1

"'Among 2

"'And 2

"'Another 1

"'As 2

"'At 1

"'Aussi 1

"'Be 2

"'Being 1

"'But 1

"'But,' 1

"'But--still--monsieur----' 1

"'Catherine, 1

"'Comb 1

Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Search WWH ::

Custom Search

Home