Database Reference
In-Depth Information
Configuring SSH
The Hadoop control scripts (but not the daemons) rely on SSH to perform cluster-wide
operations. For example, there is a script for stopping and starting all the daemons in the
cluster. Note that the control scripts are optional — cluster-wide operations can be per-
formed by other mechanisms, too, such as a distributed shell or dedicated Hadoop man-
agement applications.
To work seamlessly, SSH needs to be set up to allow passwordless login for the
hdfs
and
ate a public/private key pair and place it in an NFS location that is shared across the
cluster.
First, generate an RSA key pair by typing the following. You need to do this twice, once
as the
hdfs
user and once as the
yarn
user:
%
ssh-keygen -t rsa -f ~/.ssh/id_rsa
Even though we want passwordless logins, keys without passphrases are not considered
good practice (it's OK to have an empty passphrase when running a local pseudo-distrib-
uted cluster, as described in
Appendix A
), so we specify a passphrase when prompted for
one. We use
ssh-agent
to avoid the need to enter a password for each connection.
The private key is in the file specified by the
-f
option,
~/.ssh/id_rsa
, and the public key
is stored in a file with the same name but with
.pub
appended,
~/.ssh/id_rsa.pub
.
Next, we need to make sure that the public key is in the
~/.ssh/authorized_keys
file on all
the machines in the cluster that we want to connect to. If the users' home directories are
stored on an NFS filesystem, the keys can be shared across the cluster by typing the fol-
lowing (first as
hdfs
and then as
yarn
):
%
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
If the home directory is not shared using NFS, the public keys will need to be shared by
some other means (such as
ssh-copy-id
).
Test that you can SSH from the master to a worker machine by making sure
ssh-agent
is
worker without entering the passphrase again.