Database Reference
In-Depth Information
Before adding data to HDFS, there must be a directory to hold it. To create
a user directory for the
MSBigDataSolutions
user, you run the
mkdir
command:
hadoop dfs -mkdir /user/MSBigDataSolutions
If a directory is created by accident, or it is no longer needed, you can
removeitbyusingthe
rmr
command.
rmr
isshortforremoverecursive,and
it removes the directory specified and any subdirectories:
hadoop dfs -rmr /user/DirectoryToRemove
After a directory has been selected or created, files can be copied to it. The
most common scenario for this is to copy files from the local file system
into HDFS using the
put
command. This example uses the sample data files
created in this chapter:
hadoop dfs -put C:\MSBigDataSolutions\SampleData1.txt
/user/MSBigDataSolutions
This command loads a single file from the local file system
(
C:\MSBigDataSolutions\SampleData1.txt
) to a directory in HDFS
(
/user/MSBigDataSolutions
). You can use the following command to
verify the file was loaded correctly:
hadoop dfs -ls /user/MSBigDataSolutions
put
can load multiple files to HDFS simultaneously. You do that by using a
folder as the source path, in which case all files in the folder are uploaded.
You can also do so by using wildcards in the source system path:
hadoop dfs -put C:\MSBigDataSolutions\SampleData_*
/user/MSBigDataSolutions
Two other commands are related to
put
.
copyFromLocal
works exactly
like the
put
command, and is simply an alias for it.
moveFromLocal
also
functions like
put
, with the difference that the local file is deleted after the
specified file(s) are loaded into HDFS.