Database Reference
In-Depth Information
to datanodes. This is possible only because the namenode shares its secret key used to
generate the block access token with datanodes (sending it in heartbeat messages), so that
they can verify block access tokens. Thus, an HDFS block may be accessed only by a cli-
ent with a valid block access token from a namenode. This closes the security hole in un-
secured Hadoop where only the block ID was needed to gain access to a block. This prop-
erty is enabled by setting dfs.block.access.token.enable to true .
In MapReduce, job resources and metadata (such as JAR files, input splits, and configura-
tion files) are shared in HDFS for the application master to access, and user code runs on
the node managers and accesses files on HDFS (the process is explained in Anatomy of a
MapReduce Job Run ) . Delegation tokens are used by these components to access HDFS
during the course of the job. When the job has finished, the delegation tokens are invalid-
ated.
Delegation tokens are automatically obtained for the default HDFS instance, but if your
job needs to access other HDFS clusters, you can load the delegation tokens for these by
setting the mapreduce.job.hdfs-servers job property to a comma-separated list
of HDFS URIs.
Other Security Enhancements
Security has been tightened throughout the Hadoop stack to protect against unauthorized
access to resources. The more notable features are listed here:
▪ Tasks can be run using the operating system account for the user who submitted
the job, rather than the user running the node manager. This means that the oper-
ating system is used to isolate running tasks, so they can't send signals to each
other (to kill another user's tasks, for example) and so local information, such as
task data, is kept private via local filesystem permissions.
This feature is enabled by setting yarn.nodemanager.container-ex-
ecutor.class to
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor . [ 75 ]
In addition, administrators need to ensure that each user is given an account on
every node in the cluster (typically using LDAP).
▪ When tasks are run as the user who submitted the job, the distributed cache (see
Distributed Cache ) is secure. Files that are world-readable are put in a shared
cache (the insecure default); otherwise, they go in a private cache, readable only
by the owner.
▪ Users can view and modify only their own jobs, not others. This is enabled by set-
ting mapreduce.cluster.acls.enabled to true . There are two job
 
Search WWH ::




Custom Search