Database Reference
In-Depth Information
We also could have used a relative path and copied the file to our home directory in
HDFS, which in this case is /user/tom :
% hadoop fs -copyFromLocal input/docs/quangle.txt quangle.txt
Let's copy the file back to the local filesystem and check whether it's the same:
% hadoop fs -copyToLocal quangle.txt quangle.copy.txt
% md5 input/docs/quangle.txt quangle.copy.txt
MD5 (input/docs/quangle.txt) = e7891a2627cf263a079fb0f18256ffb2
MD5 (quangle.copy.txt) = e7891a2627cf263a079fb0f18256ffb2
The MD5 digests are the same, showing that the file survived its trip to HDFS and is back
intact.
Finally, let's look at an HDFS file listing. We create a directory first just to see how it is
displayed in the listing:
% hadoop fs -mkdir topics
% hadoop fs -ls .
Found 2 items
drwxr-xr-x - tom supergroup 0 2014-10-04 13:22 topics
-rw-r--r-- 1 tom supergroup 119 2014-10-04 13:21 quangle.txt
The information returned is very similar to that returned by the Unix command ls -l ,
with a few minor differences. The first column shows the file mode. The second column is
the replication factor of the file (something a traditional Unix filesystem does not have).
Remember we set the default replication factor in the site-wide configuration to be 1,
which is why we see the same value here. The entry in this column is empty for director-
ies because the concept of replication does not apply to them — directories are treated as
metadata and stored by the namenode, not the datanodes. The third and fourth columns
show the file owner and group. The fifth column is the size of the file in bytes, or zero for
directories. The sixth and seventh columns are the last modified date and time. Finally, the
eighth column is the name of the file or directory.
Search WWH ::




Custom Search