Database Reference
In-Depth Information
SETTING USER IDENTITY
The user identity that Hadoop uses for permissions in HDFS is determined by running the
whoami
com-
mand on the client system. Similarly, the group names are derived from the output of running
groups
.
If, however, your Hadoop user identity is different from the name of your user account on your client
machine, you can explicitly set your Hadoop username by setting the
HADOOP_USER_NAME
environ-
ment variable. You can also override user group mappings by means of the
ha-
doop.user.group.static.mapping.overrides
configuration property. For example,
dr.who=;preston=directors,inventors
means that the
dr.who
user is in no groups, but
preston
is in the
directors
and
inventors
groups.
You can set the user identity that the Hadoop web interfaces run as by setting the
ha-
doop.http.staticuser.user
property. By default, it is
dr.who
, which is not a superuser, so
system files are not accessible through the web interface.
Notice that, by default, there is no authentication with this system. See
Security
for how to use Kerberos
authentication with Hadoop.
With this setup, it is easy to use any configuration with the
-conf
command-line switch.
For example, the following command shows a directory listing on the HDFS server run-
ning in pseudodistributed mode on localhost:
%
hadoop fs -conf conf/hadoop-localhost.xml -ls .
Found 2 items
drwxr-xr-x - tom supergroup 0 2014-09-08 10:19 input
drwxr-xr-x - tom supergroup 0 2014-09-08 10:19 output
If you omit the
-conf
option, you pick up the Hadoop configuration in the
etc/hadoop
subdirectory under
$HADOOP_HOME
. Or, if
HADOOP_CONF_DIR
is set, Hadoop config-
uration files will be read from that location.
NOTE
Here's an alternative way of managing configuration settings. Copy the
etc/hadoop
directory from your
Hadoop installation to another location, place the
*-site.xml
configuration files there (with appropriate
settings), and set the
HADOOP_CONF_DIR
environment variable to the alternative location. The main
advantage of this approach is that you don't need to specify
-conf
for every command. It also allows
you to isolate changes to files other than the Hadoop XML configuration files (e.g.,
log4j.properties
)
ation
).