Database Reference
In-Depth Information
1. Copy a compiled version of Spark to the same location on all your machines—for
example, /home/yourname/spark .
2. Set up password-less SSH access from your master machine to the others. This
requires having the same user account on all the machines, creating a private
SSH key for it on the master via ssh-keygen , and adding this key to the .ssh/
authorized_keys file of all the workers. If you have not set this up before, you can
follow these commands:
# On master: run ssh-keygen accepting default options
$ ssh-keygen -t dsa
Enter file in which to save the key ( /home/you/.ssh/id_dsa ) : [ ENTER ]
Enter passphrase ( empty for no passphrase ) : [ EMPTY ]
Enter same passphrase again: [ EMPTY ]
# On workers:
# copy ~/.ssh/id_dsa.pub from your master to the worker, then use:
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ chmod 644 ~/.ssh/authorized_keys
3. Edit the conf/slaves file on your master and fill in the workers' hostnames.
4. To start the cluster, run sbin/start-all.sh on your master (it is important to
run it there rather than on a worker). If everything started, you should get no
prompts for a password, and the cluster manager's web UI should appear at
http://masternode:8080 and show all your workers.
5. To stop the cluster, run bin/stop-all.sh on your master node.
If you are not on a UNIX system or would like to launch the cluster manually, you
can also start the master and workers by hand, using the spark-class script in
Spark's bin/ directory. On your master, type:
bin/spark-class org.apache.spark.deploy.master.Master
Then on workers:
bin/spark-class org.apache.spark.deploy.worker.Worker spark://masternode:7077
(where masternode is the hostname of your master). On Windows, use \ instead of / .
By default, the cluster manager will automatically allocate the amount of CPUs and
memory on each worker and pick a suitable default to use for Spark. More details on
configuring the Standalone cluster manager are available in Spark's official
documentation .
Submitting applications
To submit an application to the Standalone cluster manager, pass spark://master
node:7077 as the master argument to spark-submit . For example:
Search WWH ::




Custom Search