Database Reference
In-Depth Information
Deploying Applications with spark-submit
As you've learned, Spark provides a single tool for submitting jobs across all cluster
managers, called spark-submit . In Chapter 2 you saw a simple example of submit‐
ting a Python program with spark-submit , repeated here in Example 7-1 .
Example 7-1. Submitting a Python application
bin/spark-submit my_script.py
When spark-submit is called with nothing but the name of a script or JAR, it simply
runs the supplied Spark program locally. Let's say we wanted to submit this program
to a Spark Standalone cluster. We can provide extra flags with the address of a Stand‐
alone cluster and a specific size of each executor process we'd like to launch, as shown
in Example 7-2 .
Example 7-2. Submitting an application with extra arguments
bin/spark-submit --master spark://host:7077 --executor-memory 10g my_script.py
The --master flag specifies a cluster URL to connect to; in this case, the spark:// URL
means a cluster using Spark's Standalone mode (see Table 7-1 ). We will discuss other
URL types later.
Table 7-1. Possible values for the --master flag in spark-submit
Value
Explanation
Connect to a Spark Standalone cluster at the specified port. By default Spark Standalone masters use
port 7077.
spark://
host:port
Connect to a Mesos cluster master at the specified port. By default Mesos masters listen on port 5050.
mesos://
host:port
Connect to a YARN cluster. When running on YARN you'll need to set the HADOOP_CONF_DIR
environment variable to point the location of your Hadoop configuration directory, which contains
information about the cluster.
yarn
Run in local mode with a single core.
local
Run in local mode with N cores.
local[N]
Run in local mode and use as many cores as the machine has.
local[*]
Apart from a cluster URL, spark-submit provides a variety of options that let you
control specific details about a particular run of your application. These options fall
 
Search WWH ::




Custom Search