Getting Started with Apache Hadoop - Cloudera Administration

Database Reference

In-Depth Information

Responsibilities of a Hadoop administrator

With the increase in the interest to derive insight on their big data, organizations are now

planning and building their big data teams aggressively. To start working on their data, they

need to have a good solid infrastructure. Once they have this setup, they need several con-

trols and system policies in place to maintain, manage, and troubleshoot their cluster.

There is an ever-increasing demand for Hadoop Administrators in the market as their func-

tion (setting up and maintaining Hadoop clusters) is what makes analysis really possible.

The Hadoop administrator needs to be very good at system operations, networking, operat-

ing systems, and storage. They need to have a strong knowledge of computer hardware and

their operations, in a complex network.

Apache Hadoop, mainly, runs on Linux. So having good Linux skills such as monitoring,

troubleshooting, configuration, and security is a must.

Setting up nodes for clusters involves a lot of repetitive tasks and the Hadoop administrator

should use quicker and efficient ways to bring up these servers using configuration man-

agement tools such as Puppet, Chef, and CFEngine. Apart from these tools, the adminis-

trator should also have good capacity planning skills to design and plan clusters.

There are several nodes in a cluster that would need duplication of data, for example, the

fsimage file of the namenode daemon can be configured to write to two different disks

on the same node or on a disk on a different node. An understanding of NFS mount points

and how to set it up within a cluster is required. The administrator may also be asked to set

up RAID for disks on specific nodes.

As all Hadoop services/daemons are built on Java, a basic knowledge of the JVM along

with the ability to understand Java exceptions would be very useful. This helps administrat-

ors identify issues quickly.

The Hadoop administrator should possess the skills to benchmark the cluster to test per-

formance under high traffic scenarios.

Clusters are prone to failures as they are up all the time and are processing large amounts of

data regularly. To monitor the health of the cluster, the administrator should deploy monit-

oring tools such as Nagios and Ganglia and should configure alerts and monitors for critical

nodes of the cluster to foresee issues before they occur.

Search WWH ::

Custom Search

Home