Database Reference
In-Depth Information
have very different interfaces for working with their services, so once you've developed
some automation around the process of building and tearing down these test clusters, you've
effectively locked yourself in with a single service provider. Apache Whirr provides a stand-
ard mechanism for working with a handful of different service providers. This allows you to
easily change cloud providers or to share configurations with other teams that do not use the
same cloud provider.
The most basic building block of Whirr is the instance template.
Instance templates
define a
purpose; for example, there are templates for the Hadoop jobtracker, ZooKeeper, and HBase
region nodes.
Recipes
are one step up the stack from templates and define a cluster. For ex-
ample, a
recipe
for a simple data-processing cluster might call for deploying a Hadoop
NameNode, a Hadoop jobtracker, a couple ZooKeeper servers, an HBase master, and a hand-
ful of HBase region servers.
Tutorial Links
The
official Apache Whirr website
provides a couple of excellent tutorials. The
Whirr in 5
minutes
tutorial provides the exact commands necessary to spin up and shut down your first
cluster. The
quick-start guide
is a little more involved, walking through what happens during
each stage of the process.
Example Code
In this case, we're going to deploy the simple data cluster we described earlier to an Amazon
EC2 account we've already established.
The first step is to build our recipe file (we'll call this file
field_guide.properties
):
# field_guide.properties
# The name we'll give this cluster,
# this gets communicated with the cloud service provider
whirr.cluster-name
=
field_guide
# Because we're just testing
# we'll put all the masters on one single machine
# and build only three worker nodes
whirr.instance-templates
=
\
1
zookeeper+hadoop-namenode
\
+hadoop-jobtracker
\
+hbase-master,
\