Database Reference
In-Depth Information
Getting Spark running on Amazon EC2
The Spark project provides scripts to run a Spark cluster in the cloud on Amazon's EC2
service. These scripts are located in the
ec2
directory. You can run the
spark-ec2
script
contained in this directory with the following command:
>./ec2/spark-ec2
Running it in this way without an argument will show the help output:
Usage: spark-ec2 [options] <action> <cluster_name>
<action> can be: launch, destroy, login, stop, start,
get-master
Options:
...
Before creating a Spark EC2 cluster, you will need to ensure you have an Amazon account.
Tip
If you don't have an Amazon Web Services account, you can sign up at
ht-
The AWS console is available at
http://aws.amazon.com/console/
.
You will also need to create an Amazon EC2 key pair and retrieve the relevant security cre-
dentials. The Spark documentation for EC2 (available at
http://spark.apache.org/docs/latest/
ec2-scripts.html
)
explains the requirements:
Create an Amazon EC2 key pair for yourself. This can be done by logging into your
Amazon Web Services account through the AWS console, clicking on
Key Pairs
on the
left sidebar, and creating and downloading a key. Make sure that you set the permis-
sions for the private key file to 600 (that is, only you can read and write it) so that
ssh
will work.
Whenever you want to use the
spark-ec2
script, set the environment variables
AWS_ACCESS_KEY_ID
and
AWS_SECRET_ACCESS_KEY
to your Amazon EC2