Database Reference
In-Depth Information
NOTE
The following commands will work on an HDInsight cluster using the
standard HDFS implementation as well as ASV (discussed in Chapter
13, “Big Data and the Cloud”). However, you need to adjust the paths
for each case. To reference a path in the locally attached distributed file
system, use
hdfs://<namenodehost>/<path>
as the path. To
reference a path in ASV, use
asv://[<container>@]<accountname>.blob.core.windows.net/
<path>
as the path. You can change the
asv
prefix to
asvs
to use an
encrypted connection.
By default, HDInsight creates the directories listed in
Table 5.1
during the
initial setup.
Table 5.1
Initial HDFS Root Directories
Directory
Name
Purpose
Directory used by Hive for data storage (see Chapter 6,
“Adding Structure with Hive”)
/hive
Directory used for MapReduce
/mapred
Directory for user data
/user
You can list the root directories by using the
ls
or
lsr
command:
hadoop dfs -ls /
hadoop dfs -lsr /
ls
lists the directory contents of the specified folder. In the example,
/
indicates the root folder.
lsr
lists directory contents, as well, but it does it
recursively for each subfolder it encounters.
Normally, user files are created in a subfolder of the
/user
folder, with
the username being used for the title of the folder. However, this is not a
requirement, and you can tailor the folder structure to fit specific scenarios.
The following examples use a fictional user named
MSBigDataSolutions
.