Cloud Computing and Virtualization - Field Guide to Hadoop

Database Reference

In-Depth Information

have very different interfaces for working with their services, so once you've developed

some automation around the process of building and tearing down these test clusters, you've

effectively locked yourself in with a single service provider. Apache Whirr provides a stand-

ard mechanism for working with a handful of different service providers. This allows you to

easily change cloud providers or to share configurations with other teams that do not use the

same cloud provider.

The most basic building block of Whirr is the instance template. Instance templates define a

purpose; for example, there are templates for the Hadoop jobtracker, ZooKeeper, and HBase

region nodes. Recipes are one step up the stack from templates and define a cluster. For ex-

ample, a recipe for a simple data-processing cluster might call for deploying a Hadoop

NameNode, a Hadoop jobtracker, a couple ZooKeeper servers, an HBase master, and a hand-

ful of HBase region servers.

Tutorial Links

The official Apache Whirr website provides a couple of excellent tutorials. The Whirr in 5

minutes tutorial provides the exact commands necessary to spin up and shut down your first

cluster. The quick-start guide is a little more involved, walking through what happens during

each stage of the process.

Example Code

In this case, we're going to deploy the simple data cluster we described earlier to an Amazon

EC2 account we've already established.

The first step is to build our recipe file (we'll call this file field_guide.properties ):

# field_guide.properties

# The name we'll give this cluster,

# this gets communicated with the cloud service provider

whirr.cluster-name = field_guide

# Because we're just testing

# we'll put all the masters on one single machine

# and build only three worker nodes

whirr.instance-templates = \

1 zookeeper+hadoop-namenode \

+hadoop-jobtracker \

+hbase-master, \

Search WWH ::

Custom Search

Home