Database Reference
In-Depth Information
spark-submit --master spark://masternode:7077 yourapp
This cluster URL is also shown in the Standalone cluster manager's web UI, at http://
masternode:8080 . Note that the hostname and port used during submission must
exactly match the URL present in the UI. This can trip up users who try to encode an
IP address, for instance, instead of a hostname. Even if the IP address is associated
with the same host, submission will fail if the naming doesn't match exactly. Some
administrators might configure Spark to use a different port than 7077. To ensure
consistency of host and port components, one safe bet is to just copy and paste the
URL directly from the UI of the master node.
You can also launch spark-shell or pyspark against the cluster in the same way, by
passing the --master parameter:
spark-shell --master spark://masternode:7077
pyspark --master spark://masternode:7077
To check that your application or shell is running, look at the cluster manager's web
UI http://masternode:8080 and make sure that (1) your application is connected (i.e.,
it appears under Running Applications) and (2) it is listed as having more than 0
cores and memory.
A common pitfall that might prevent your application from run‐
ning is requesting more memory for executors (with the --
executor-memory flag to spark-submit ) than is available in the
cluster. In this case, the Standalone cluster manager will never allo‐
cate executors for the application. Make sure that the value your
application is requesting can be satisfied by the cluster.
Finally, the Standalone cluster manager supports two deploy modes for where the
driver program of your application runs. In client mode (the default), the driver runs
on the machine where you executed spark-submit , as part of the spark-submit com‐
mand. This means that you can directly see the output of your driver program, or
send input to it (e.g., for an interactive shell), but it requires the machine from which
your application was submitted to have fast connectivity to the workers and to stay
available for the duration of your application. In contrast, in cluster mode, the driver
is launched within the Standalone cluster, as another process on one of the worker
nodes, and then it connects back to request executors. In this mode spark-submit is
“fire-and-forget” in that you can close your laptop while the application is running.
You will still be able to access logs for the application through the cluster manager's
web UI. You can switch to cluster mode by passing --deploy-mode cluster to
spark-submit .
Search WWH ::




Custom Search