Building and Executing Your Big Data Plan - Microsoft Big Data Solutions

Database Reference

In-Depth Information

of time on resource utilization and balancing. A balanced configuration will

allow you to spend less time administering your cluster and more time

running awesome solutions that provide value to the line-of-business

sponsoring the solution.

Planning for Ongoing Maintenance

There a few tasks that you should be acquainted with in order to perform

ongoing maintenance of a Hadoop cluster. In this section we'll cover what

you need to know in order to stop jobs, add nodes, and finally rebalance

nodes if the data becomes skewed.

Stopping a Map-reduce job

One requirement for a Hadoop administrator is to start and stop

map-reduce jobs. You may be asked to kill a job submitted by a user because

it's running longer than the user expected. This might result from there

being more data than they expected, or perhaps they are simply running an

incorrect algorithm or process. When a job is killed, all processes associated

with the job are stopped, the memory associated with the job is discarded,

and temporary data written to disk is deleted; the originator of the job

receives notification that the job has failed.

To stop a map-reduce job, complete the following steps:

1. From a Hadoop command prompt, run hadoop job -list .

2. Run hadoop job -kill jobid to kill the job.

Adding and Removing Cluster Nodes

Usually data nodes are added because there is a need for additional data

capacity. However, it is entirely possible that it may be in response to

additional I/O bandwidth needs because of additional computing

requirements. Adding the data node is a quick online process of adding the

node to the configuration file:

hadoop dfsadmin -refreshNodes

Rebalancing Cluster Nodes

Hadoop nodes become unbalanced most often when new data nodes are

added to a cluster. Those new data nodes receive as much new data as any of

Search WWH ::

Custom Search

Home