Running on a Cluster - Learning Spark

Database Reference

In-Depth Information

export AWS_ACCESS_KEY_ID = "..."

export AWS_SECRET_ACCESS_KEY = "..."

In addition, create an EC2 SSH key pair and download its private key file (usually

called keypair.pem ) so that you can SSH into the machines.

Next, run the launch command of the spark-ec2 script, giving it your key pair name,

private key file, and a name for the cluster. By default, this will launch a cluster with

one master and one slave, using m1.xlarge EC2 instances:

cd /path/to/spark/ec2

./spark-ec2 -k mykeypair -i mykeypair.pem launch mycluster

You can also configure the instance types, number of slaves, EC2 region, and other

factors using options to spark-ec2 . For example:

# Launch a cluster with 5 slaves of type m3.xlarge

./spark-ec2 -k mykeypair -i mykeypair.pem -s 5 -t m3.xlarge launch mycluster

For a full list of options, run spark-ec2 --help . Some of the most common ones are

listed in Table 7-3 .

Table 7-3. Common options to spark-ec2

Option

Meaning

Name of key pair to use

-k KEYPAIR

Private key file (ending in .pem )

-i IDENTITY_FILE

Number of slave nodes

-s NUM_SLAVES

Amazon instance type to use

-t INSTANCE_TYPE

Amazon region to use (e.g., us-west-1 )

-r REGION

Availability zone (e.g., us-west-1b )

-z ZONE

Use spot instances at the given spot price (in US dollars)

--spot-price=PRICE

Once you launch the script, it usually takes about five minutes to launch the

machines, log in to them, and set up Spark.

Logging in to a cluster

You can log in to a cluster by SSHing into its master node with the .pem file for your

keypair. For convenience, spark-ec2 provides a login command for this purpose:

./spark-ec2 -k mykeypair -i mykeypair.pem login mycluster

Search WWH ::

Custom Search

Home