Running on a Cluster - Learning Spark

Database Reference

In-Depth Information

1. Copy a compiled version of Spark to the same location on all your machines—for

example, /home/yourname/spark .

2. Set up password-less SSH access from your master machine to the others. This

requires having the same user account on all the machines, creating a private

SSH key for it on the master via ssh-keygen , and adding this key to the .ssh/

authorized_keys file of all the workers. If you have not set this up before, you can

follow these commands:

# On master: run ssh-keygen accepting default options

$ ssh-keygen -t dsa

Enter file in which to save the key ( /home/you/.ssh/id_dsa ) : [ ENTER ]

Enter passphrase ( empty for no passphrase ) : [ EMPTY ]

Enter same passphrase again: [ EMPTY ]

# On workers:

# copy ~/.ssh/id_dsa.pub from your master to the worker, then use:

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

$ chmod 644 ~/.ssh/authorized_keys

3. Edit the conf/slaves file on your master and fill in the workers' hostnames.

4. To start the cluster, run sbin/start-all.sh on your master (it is important to

run it there rather than on a worker). If everything started, you should get no

prompts for a password, and the cluster manager's web UI should appear at

http://masternode:8080 and show all your workers.

5. To stop the cluster, run bin/stop-all.sh on your master node.

If you are not on a UNIX system or would like to launch the cluster manually, you

can also start the master and workers by hand, using the spark-class script in

Spark's bin/ directory. On your master, type:

bin/spark-class org.apache.spark.deploy.master.Master

Then on workers:

bin/spark-class org.apache.spark.deploy.worker.Worker spark://masternode:7077

(where masternode is the hostname of your master). On Windows, use \ instead of / .

By default, the cluster manager will automatically allocate the amount of CPUs and

memory on each worker and pick a suitable default to use for Spark. More details on

configuring the Standalone cluster manager are available in Spark's official

documentation .

Submitting applications

To submit an application to the Standalone cluster manager, pass spark://master

node:7077 as the master argument to spark-submit . For example:

Search WWH ::

Custom Search

Home