Database Reference
In-Depth Information
The amount of memory to use for executors, in bytes. Suffixes can be used to specify larger quantities
such as “512m” (512 megabytes) or “15g” (15 gigabytes).
The amount of memory to use for the driver process, in bytes. Suffixes can be used to specify larger
quantities such as “512m” (512 megabytes) or “15g” (15 gigabytes).
spark-submit also allows setting arbitrary SparkConf configuration options using
either the --conf prop=value flag or providing a properties file through --
properties-file that contains key/value pairs. Chapter 8 will discuss Spark's config‐
uration system.
Example 7-4 shows a few longer-form invocations of spark-submit using various
Example 7-4. Using spark-submit with various options
# Submitting a Java application to Standalone cluster mode
$ ./bin/spark-submit \
--master spark://hostname:7077 \
--deploy-mode cluster \
--class com.databricks.examples.SparkExample \
--name "Example Program" \
--jars dep1.jar,dep2.jar,dep3.jar \
--total-executor-cores 300 \
--executor-memory 10g \
myApp.jar "options" "to your application" "go here"
# Submitting a Python application in YARN client mode
$ export HADOP_CONF_DIR = /opt/hadoop/conf
$ ./bin/spark-submit \
--master yarn \
--py-files somelib-1.2.egg,, \
--deploy-mode client \
--name "Example Program" \
--queue exampleQueue \
--num-executors 40 \
--executor-memory 10g \ "options" "to your application" "go here"
Packaging Your Code and Dependencies
Throughout most of this topic we've provided example programs that are self-
contained and had no library dependencies outside of Spark. More often, user pro‐
grams depend on third-party libraries. If your program imports any libraries that are
not in the org.apache.spark package or part of the language library, you need to
Search WWH ::

Custom Search