Databases Reference
In-Depth Information
AWS_ACCOUNT_ID —Your 12-digit AWS account ID
AWS_ACCESS_KEY_ID —Your 20-character, alphanumeric Access Key ID
AWS_SECRET_ACCESS_KEY —Your 40-character Secret Access Key
The tools for Hadoop on EC2 get the other security parameters from environment
variables (which should be set when you source aws-init.sh ) or are based on
defaults that should work fine if you have followed the AWS setup in section 9.2.
9.3.2
Configuring cluster
type
You'll need to specify the configuration of your Hadoop cluster in hadoop-ec2-env.
sh . You need to set three main parameters: HADOOP_VERSION , INSTANCE_TYPE , and
S3_BUCKET . Before telling you how to set these parameters, let's go over a little
background.
Before the creation of an instance, Amazon EC2 must know the instance type and
the image used to boot up the instance. Instance type is the physical configuration of
the virtual machine (CPU, RAM, disk space, etc.). As of this writing, five instance types
are available, grouped into two families: standard and high-CPU. High-CPU types are
for compute-intensive work. Rarely are they used for Hadoop applications, which tend
to be data-intensive. The standard family has three instance types, and table 9.1 lists
their attributes.
The more powerful instance types cost more, and you should look up the AWS
website to find the latest pricing.
Only Amazon's S3
storage service can store images for booting up EC2 instance.
Many existing images are available for all kinds of setups. You can use one of the public
images, or pay for special custom images, or even create your own. Similar images are
stored in the same S3 bucket. 3 The standard public Hadoop images are either in the
hadoop-ec2-images bucket or the hadoop-images bucket. In fact, we only use the hadoop-
images bucket
because the newer versions of Hadoop (after 0.17.1) aren't available in
the hadoop-ec2-images bucket. The Hadoop team puts new EC2 images in the hadoop-
images bucket when significant versions of Hadoop are released. At any point in time,
execute the following EC2 command to see the available Hadoop images:
ec2-describe-images -x all | grep hadoop-images
Table 9.1 Specification for various EC2 instance types
Type
CPU
Memory
Storage
Platform
I/O
Name
Small
1 EC2 Compute Unit
1.7 GB
160 GB
32-bit
Moderate
m1.small
Large
4 EC2 Compute Unit
7.5 GB
850 GB
64-bit
High
m1.large
Extra Large
8 EC2 Compute Unit
15 GB
1690 GB
64-bit
High
m1.xlarge
3 An S3 bucket is the top-level partition in S3's namespace. A bucket is owned by exactly one AWS account
and must have a globally unique name.
 
Search WWH ::




Custom Search