Database Reference
In-Depth Information
Alternatively, you can find the master's hostname by running:
./spark-ec2 get-master mycluster
Then SSH into it yourself using ssh -i keypair.pem root@masternode .
Once you are in the cluster, you can use the Spark installation in /root/spark to run
programs. This is a Standalone cluster installation, with the master URL spark://
masternode:7077 . If you launch an application with spark-submit , it will come cor‐
rectly configured to submit your application to this cluster automatically. You can
view the cluster's web UI at http://masternode:8080 .
Note that only programs launched from the cluster will be able to submit jobs to it
with spark-submit ; the firewall rules will prevent external hosts from submitting
them for security reasons. To run a prepackaged application on the cluster, first copy
it over using SCP:
scp -i mykeypair.pem app.jar root@masternode:~
Destroying a cluster
To destroy a cluster launched by spark-ec2 , run:
./spark-ec2 destroy mycluster
This will terminate all the instances associated with the cluster (i.e., all instances in its
two security groups, mycluster-master and mycluster-slaves ).
Pausing and restarting clusters
In addition to outright terminating clusters, spark-ec2 lets you stop the Amazon
instances running your cluster and then start them again later. Stopping instances
shuts them down and makes them lose all data on the “ephemeral” disks, which are
configured with an installation of HDFS for spark-ec2 (see “Storage on the cluster”
on page 138 ). However, the stopped instances retain all data in their root directory (e.g.,
any files you uploaded there), so you'll be able to quickly return to work.
To stop a cluster, use:
./spark-ec2 stop mycluster
Then, later, to start it up again:
./spark-ec2 -k mykeypair -i mykeypair.pem start mycluster
Search WWH ::




Custom Search