Database Reference
In-Depth Information
memory used by each executor via --executor-memory and the number of cores it
claims from YARN via --executor-cores . On a given set of hardware resources,
Spark will usually run better with a smaller number of larger executors (with multiple
cores and more memory), since it can optimize communication within each execu‐
tor. Note, however, that some clusters have a limit on the maximum size of an execu‐
tor (8 GB by default), and will not let you launch larger ones.
Some YARN clusters are configured to schedule applications into multiple “queues”
for resource management purposes. Use the --queue option to select your queue
name.
Finally, further information on configuration options for YARN is available in the
official Spark documentation .
Apache Mesos
Apache Mesos is a general-purpose cluster manager that can run both analytics
workloads and long-running services (e.g., web applications or key/value stores) on a
cluster. To use Spark on Mesos, pass a mesos:// URI to spark-submit :
spark-submit --master mesos://masternode:5050 yourapp
You can also configure Mesos clusters to use ZooKeeper to elect a master when run‐
ning in multimaster node. In this case, use a mesos://zk:// URI pointing to a list of
ZooKeeper nodes. For example, if you have three ZooKeeper nodes ( node1 , node2 ,
and node3 ) on which ZooKeeper is running on port 2181, use the following URI:
mesos://zk://node1:2181/mesos,node2:2181/mesos,node3:2181/mesos
Mesos scheduling modes
Unlike the other cluster managers, Mesos offers two modes to share resources
between executors on the same cluster. In “fine-grained” mode, which is the default,
executors scale up and down the number of CPUs they claim from Mesos as they exe‐
cute tasks, and so a machine running multiple executors can dynamically share CPU
resources between them. In “coarse-grained” mode, Spark allocates a fixed number of
CPUs to each executor in advance and never releases them until the application ends,
even if the executor is not currently running tasks. You can enable coarse-grained
mode by passing --conf spark.mesos.coarse=true to spark-submit .
The fine-grained Mesos mode is attractive when multiple users share a cluster to run
interactive workloads such as shells, because applications will scale down their num‐
ber of cores when they're not doing work and still allow other users' programs to use
the cluster. The downside, however, is that scheduling tasks through fine-grained
mode adds more latency (so very low-latency applications like Spark Streaming may
suffer), and that applications may need to wait some amount of time for CPU cores
Search WWH ::




Custom Search