Database Reference
In-Depth Information
1. Copy a compiled version of Spark to the same location on all your machines—for
example,
/home/yourname/spark
.
2. Set up password-less SSH access from your master machine to the others. This
requires having the same user account on all the machines, creating a private
SSH key for it on the master via
ssh-keygen
, and adding this key to the
.ssh/
authorized_keys
file of all the workers. If you have not set this up before, you can
follow these commands:
# On master: run ssh-keygen accepting default options
$
ssh-keygen -t dsa
Enter file in which to save the key
(
/home/you/.ssh/id_dsa
)
:
[
ENTER
]
Enter passphrase
(
empty
for
no passphrase
)
:
[
EMPTY
]
Enter same passphrase again:
[
EMPTY
]
# On workers:
# copy ~/.ssh/id_dsa.pub from your master to the worker, then use:
$
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$
chmod
644
~/.ssh/authorized_keys
3. Edit the
conf/slaves
file on your master and fill in the workers' hostnames.
4. To start the cluster, run
sbin/start-all.sh
on your master (it is important to
run it there rather than on a worker). If everything started, you should get no
prompts for a password, and the cluster manager's web UI should appear at
http://masternode:8080
and show all your workers.
5. To stop the cluster, run
bin/stop-all.sh
on your master node.
If you are not on a UNIX system or would like to launch the cluster manually, you
can also start the master and workers by hand, using the
spark-class
script in
Spark's
bin/
directory. On your master, type:
bin/spark-class org.apache.spark.deploy.master.Master
Then on workers:
bin/spark-class org.apache.spark.deploy.worker.Worker spark://masternode:7077
(where
masternode
is the hostname of your master). On Windows, use
\
instead of
/
.
By default, the cluster manager will automatically allocate the amount of CPUs and
memory on each worker and pick a suitable default to use for Spark. More details on
configuring the Standalone cluster manager are available in
Spark's official
Submitting applications
To submit an application to the Standalone cluster manager, pass
spark://master
node:7077
as the master argument to
spark-submit
. For example: