Database Reference
In-Depth Information
export AWS_ACCESS_KEY_ID = "..."
export AWS_SECRET_ACCESS_KEY = "..."
In addition, create an EC2 SSH key pair and download its private key file (usually
called keypair.pem ) so that you can SSH into the machines.
Next, run the launch command of the spark-ec2 script, giving it your key pair name,
private key file, and a name for the cluster. By default, this will launch a cluster with
one master and one slave, using m1.xlarge EC2 instances:
cd /path/to/spark/ec2
./spark-ec2 -k mykeypair -i mykeypair.pem launch mycluster
You can also configure the instance types, number of slaves, EC2 region, and other
factors using options to spark-ec2 . For example:
# Launch a cluster with 5 slaves of type m3.xlarge
./spark-ec2 -k mykeypair -i mykeypair.pem -s 5 -t m3.xlarge launch mycluster
For a full list of options, run spark-ec2 --help . Some of the most common ones are
listed in Table 7-3 .
Table 7-3. Common options to spark-ec2
Option
Meaning
Name of key pair to use
-k KEYPAIR
Private key file (ending in .pem )
-i IDENTITY_FILE
Number of slave nodes
-s NUM_SLAVES
Amazon instance type to use
-t INSTANCE_TYPE
Amazon region to use (e.g., us-west-1 )
-r REGION
Availability zone (e.g., us-west-1b )
-z ZONE
Use spot instances at the given spot price (in US dollars)
--spot-price=PRICE
Once you launch the script, it usually takes about five minutes to launch the
machines, log in to them, and set up Spark.
Logging in to a cluster
You can log in to a cluster by SSHing into its master node with the .pem file for your
keypair. For convenience, spark-ec2 provides a login command for this purpose:
./spark-ec2 -k mykeypair -i mykeypair.pem login mycluster
 
Search WWH ::




Custom Search