Database Reference
In-Depth Information
Configuring resource usage
When sharing a Spark cluster among multiple applications, you will need to decide
how to allocate resources between the executors. The Standalone cluster manager has
a basic scheduling policy that allows capping the usage of each application so that
multiple ones may run concurrently. Apache Mesos supports more dynamic sharing
while an application is running, while YARN has a concept of queues that allows you
to cap usage for various sets of applications.
In the Standalone cluster manager, resource allocation is controlled by two settings:
Executor memory
You can configure this using the --executor-memory argument to spark-
submit . Each application will have at most one executor on each worker, so this
setting controls how much of that worker's memory the application will claim.
By default, this setting is 1 GB—you will likely want to increase it on most
servers.
The maximum number of total cores
This is the total number of cores used across all executors for an application. By
default, this is unlimited; that is, the application will launch executors on every
available node in the cluster. For a multiuser workload, you should instead ask
users to cap their usage. You can set this value through the --total-executor-
cores argument to spark-submit , or by configuring spark.cores.max in your
Spark configuration file.
To verify the settings, you can always see the current resource allocation in the Stand‐
alone cluster manager's web UI, http://masternode:8080 .
Finally, the Standalone cluster manager works by spreading out each application
across the maximum number of executors by default. For example, suppose that you
have a 20-node cluster with 4-core machines, and you submit an application with --
executor-memory 1G and --total-executor-cores 8 . Then Spark will launch eight
executors, each with 1 GB of RAM, on different machines. Spark does this by default
to give applications a chance to achieve data locality for distributed filesystems run‐
ning on the same machines (e.g., HDFS), because these systems typically have data
spread out across all nodes. If you prefer, you can instead ask Spark to consolidate
executors on as few nodes as possible, by setting the config property
spark.deploy.spreadOut to false in conf/spark-defaults.conf . In this case, the pre‐
ceding application would get only two executors, each with 1 GB RAM and four
cores. This setting affects all applications on the Standalone cluster and must be con‐
figured before you launch the Standalone cluster manager.
Search WWH ::




Custom Search