Database Reference
In-Depth Information
Before adding data to HDFS, there must be a directory to hold it. To create
a user directory for the MSBigDataSolutions user, you run the mkdir
command:
hadoop dfs -mkdir /user/MSBigDataSolutions
If a directory is created by accident, or it is no longer needed, you can
removeitbyusingthe rmr command. rmr isshortforremoverecursive,and
it removes the directory specified and any subdirectories:
hadoop dfs -rmr /user/DirectoryToRemove
After a directory has been selected or created, files can be copied to it. The
most common scenario for this is to copy files from the local file system
into HDFS using the put command. This example uses the sample data files
created in this chapter:
hadoop dfs -put C:\MSBigDataSolutions\SampleData1.txt
/user/MSBigDataSolutions
This command loads a single file from the local file system
( C:\MSBigDataSolutions\SampleData1.txt ) to a directory in HDFS
( /user/MSBigDataSolutions ). You can use the following command to
verify the file was loaded correctly:
hadoop dfs -ls /user/MSBigDataSolutions
put can load multiple files to HDFS simultaneously. You do that by using a
folder as the source path, in which case all files in the folder are uploaded.
You can also do so by using wildcards in the source system path:
hadoop dfs -put C:\MSBigDataSolutions\SampleData_*
/user/MSBigDataSolutions
Two other commands are related to put . copyFromLocal works exactly
like the put command, and is simply an alias for it. moveFromLocal also
functions like put , with the difference that the local file is deleted after the
specified file(s) are loaded into HDFS.
Search WWH ::




Custom Search