Database Reference
In-Depth Information
configuration properties,
mapreduce.job.acl-view-job
and
mapreduce.job.acl-modify-job
, which may be set to a comma-separ-
ated list of users to control who may view or modify a particular job.
▪ The shuffle is secure, preventing a malicious user from requesting another user's
map outputs.
▪ When appropriately configured, it's no longer possible for a malicious user to run
a rogue secondary namenode, datanode, or node manager that can join the cluster
and potentially compromise data stored in the cluster. This is enforced by requir-
ing daemons to authenticate with the master node they are connecting to.
To enable this feature, you first need to configure Hadoop to use a keytab previ-
ously generated with the
ktutil
command. For a datanode, for example, you
would set the
dfs.datanode.keytab.file
property to the keytab filename
and
dfs.datanode.kerberos.principal
to the username to use for the
datanode. Finally, the ACL for the
DataNodeProtocol
(which is used by
datanodes to communicate with the namenode) must be set in
hadoop-policy.xml
,
by restricting
security.datanode.protocol.acl
to the datanode's user-
name.
▪ A datanode may be run on a privileged port (one lower than 1024), so a client
may be reasonably sure that it was started securely.
▪ A task may communicate only with its parent application master, thus preventing
an attacker from obtaining MapReduce data from another user's job.
▪ Various parts of Hadoop can be configured to encrypt network data, including
RPC (
hadoop.rpc.protection
), HDFS block transfers
(
dfs.encrypt.data.transfer
), the MapReduce shuffle (
mapre-
duce.shuffle.ssl.enabled
), and the web UIs
(
hadokop.ssl.enabled
). Work is ongoing to encrypt data “at rest,” too, so
that HDFS blocks can be stored in encrypted form, for example.