Database Reference
In-Depth Information
Commissioning and Decommissioning Nodes
As an administrator of a Hadoop cluster, you will need to add or remove nodes from time
to time. For example, to grow the storage available to a cluster, you commission new
nodes. Conversely, sometimes you may wish to shrink a cluster, and to do so, you decom-
mission nodes. Sometimes it is necessary to decommission a node if it is misbehaving,
perhaps because it is failing more often than it should or its performance is noticeably
slow.
Nodes normally run both a datanode and a node manager, and both are typically commis-
sioned or decommissioned in tandem.
Commissioning new nodes
Although commissioning a new node can be as simple as configuring the hdfs-site.xml file
to point to the namenode, configuring the yarn-site.xml file to point to the resource man-
ager, and starting the datanode and resource manager daemons, it is generally best to have
a list of authorized nodes.
It is a potential security risk to allow any machine to connect to the namenode and act as a
datanode, because the machine may gain access to data that it is not authorized to see.
Furthermore, because such a machine is not a real datanode, it is not under your control
and may stop at any time, potentially causing data loss. (Imagine what would happen if a
number of such nodes were connected and a block of data was present only on the “alien”
nodes.) This scenario is a risk even inside a firewall, due to the possibility of misconfigur-
ation, so datanodes (and node managers) should be explicitly managed on all production
clusters.
Datanodes that are permitted to connect to the namenode are specified in a file whose
name is specified by the dfs.hosts property. The file resides on the namenode's local
filesystem, and it contains a line for each datanode, specified by network address (as re-
ported by the datanode; you can see what this is by looking at the namenode's web UI). If
you need to specify multiple network addresses for a datanode, put them on one line, sep-
arated by whitespace.
Similarly, node managers that may connect to the resource manager are specified in a file
whose name is specified by the yarn.resourcemanager.nodes.include-path
property. In most cases, there is one shared file, referred to as the include file , that both
dfs.hosts and yarn.resourcemanager.nodes.include-path refer to,
since nodes in the cluster run both datanode and node manager daemons.
Search WWH ::




Custom Search