MapReduce - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

We also could have used a relative path and copied the file to our home directory in

HDFS, which in this case is /user/tom :

% hadoop fs -copyFromLocal input/docs/quangle.txt quangle.txt

Let's copy the file back to the local filesystem and check whether it's the same:

% hadoop fs -copyToLocal quangle.txt quangle.copy.txt

% md5 input/docs/quangle.txt quangle.copy.txt

MD5 (input/docs/quangle.txt) = e7891a2627cf263a079fb0f18256ffb2

MD5 (quangle.copy.txt) = e7891a2627cf263a079fb0f18256ffb2

The MD5 digests are the same, showing that the file survived its trip to HDFS and is back

intact.

Finally, let's look at an HDFS file listing. We create a directory first just to see how it is

displayed in the listing:

% hadoop fs -mkdir topics

% hadoop fs -ls .

Found 2 items

drwxr-xr-x - tom supergroup 0 2014-10-04 13:22 topics

-rw-r--r-- 1 tom supergroup 119 2014-10-04 13:21 quangle.txt

The information returned is very similar to that returned by the Unix command ls -l ,

with a few minor differences. The first column shows the file mode. The second column is

the replication factor of the file (something a traditional Unix filesystem does not have).

Remember we set the default replication factor in the site-wide configuration to be 1,

which is why we see the same value here. The entry in this column is empty for director-

ies because the concept of replication does not apply to them — directories are treated as

metadata and stored by the namenode, not the datanodes. The third and fourth columns

show the file owner and group. The fifth column is the size of the file in bytes, or zero for

directories. The sixth and seventh columns are the last modified date and time. Finally, the

eighth column is the name of the file or directory.

Search WWH ::

Custom Search

Home