Database Reference
In-Depth Information
roughly into two categories. The first is scheduling information, such as the amount
of resources you'd like to request for your job (as shown in
Example 7-2
). The second
is information about the runtime dependencies of your application, such as libraries
or files you want to deploy to all worker machines.
The general format for
spark-submit
is shown in
Example 7-3
.
Example 7-3. General format for spark-submit
bin/spark-submit
[
options
]
<app jar
|
python file>
[
app options
]
[options]
are a list of flags for
spark-submit
. You can enumerate all possible flags
by running
spark-submit --help
. A list of common flags is enumerated in
Table 7-2
.
<app jar | python file>
refers to the JAR or Python script containing the entry
point into your application.
[app options]
are options that will be passed onto your application. If the
main()
method of your program parses its calling arguments, it will see only
[app options]
and not the flags specific to
spark-submit
.
Table 7-2. Common flags for spark-submit
Flag
Explanation
Indicates the cluster manager to connect to. The options for this flag are described in
Table 7-1
.
--master
Whether to launch the driver program locally (“client”) or on one of the worker machines inside the
cluster (“cluster”). In client mode
spark-submit
will run your driver on the same machine where
spark-submit
is itself being invoked. In cluster mode, the driver will be shipped to execute on a
worker node in the cluster. The default is client mode.
--deploy-mode
The “main” class of your application if you're running a Java or Scala program.
--class
A human-readable name for your application. This will be displayed in Spark's web UI.
--name
A list of JAR files to upload and place on the classpath of your application. If your application depends
on a small number of third-party JARs, you can add them here.
--jars
A list of files to be placed in the working directory of your application. This can be used for data files
that you want to distribute to each node.
--files
A list of files to be added to the PYTHONPATH of your application. This can contain
.py
,
.egg
, or
.zip
files.
--py-files