Database Reference
In-Depth Information
of time on resource utilization and balancing. A balanced configuration will
allow you to spend less time administering your cluster and more time
running awesome solutions that provide value to the line-of-business
sponsoring the solution.
Planning for Ongoing Maintenance
There a few tasks that you should be acquainted with in order to perform
ongoing maintenance of a Hadoop cluster. In this section we'll cover what
you need to know in order to stop jobs, add nodes, and finally rebalance
nodes if the data becomes skewed.
Stopping a Map-reduce job
One requirement for a Hadoop administrator is to start and stop
map-reduce jobs. You may be asked to kill a job submitted by a user because
it's running longer than the user expected. This might result from there
being more data than they expected, or perhaps they are simply running an
incorrect algorithm or process. When a job is killed, all processes associated
with the job are stopped, the memory associated with the job is discarded,
and temporary data written to disk is deleted; the originator of the job
receives notification that the job has failed.
To stop a map-reduce job, complete the following steps:
1. From a Hadoop command prompt, run hadoop job -list .
2. Run hadoop job -kill jobid to kill the job.
Adding and Removing Cluster Nodes
Usually data nodes are added because there is a need for additional data
capacity. However, it is entirely possible that it may be in response to
additional I/O bandwidth needs because of additional computing
requirements. Adding the data node is a quick online process of adding the
node to the configuration file:
hadoop dfsadmin -refreshNodes
Rebalancing Cluster Nodes
Hadoop nodes become unbalanced most often when new data nodes are
added to a cluster. Those new data nodes receive as much new data as any of
Search WWH ::




Custom Search