Running on a Cluster - Learning Spark

Database Reference

In-Depth Information

Flag

Explanation

The amount of memory to use for executors, in bytes. Suffixes can be used to specify larger quantities

such as “512m” (512 megabytes) or “15g” (15 gigabytes).

--executor-

memory

The amount of memory to use for the driver process, in bytes. Suffixes can be used to specify larger

quantities such as “512m” (512 megabytes) or “15g” (15 gigabytes).

--driver-

memory

spark-submit also allows setting arbitrary SparkConf configuration options using

either the --conf prop=value flag or providing a properties file through --

properties-file that contains key/value pairs. Chapter 8 will discuss Spark's config‐

uration system.

Example 7-4 shows a few longer-form invocations of spark-submit using various

options.

Example 7-4. Using spark-submit with various options

# Submitting a Java application to Standalone cluster mode

$ ./bin/spark-submit \

--master spark://hostname:7077 \

--deploy-mode cluster \

--class com.databricks.examples.SparkExample \

--name "Example Program" \

--jars dep1.jar,dep2.jar,dep3.jar \

--total-executor-cores 300 \

--executor-memory 10g \

myApp.jar "options" "to your application" "go here"

# Submitting a Python application in YARN client mode

$ export HADOP_CONF_DIR = /opt/hadoop/conf

$ ./bin/spark-submit \

--master yarn \

--py-files somelib-1.2.egg,otherlib-4.4.zip,other-file.py \

--deploy-mode client \

--name "Example Program" \

--queue exampleQueue \

--num-executors 40 \

--executor-memory 10g \

my_script.py "options" "to your application" "go here"

Packaging Your Code and Dependencies

Throughout most of this topic we've provided example programs that are self-

contained and had no library dependencies outside of Spark. More often, user pro‐

grams depend on third-party libraries. If your program imports any libraries that are

not in the org.apache.spark package or part of the language library, you need to

Learning Spark

Search WWH ::

Custom Search

Home