Database Reference
In-Depth Information
Executors and Cluster Managers
We have seen how Spark relies on executors to run the tasks that make up a Spark job, but
we glossed over how the executors actually get started. Managing the lifecycle of executors
is the responsibility of the
cluster manager
, and Spark provides a variety of cluster man-
agers with different characteristics:
Local
In local mode there is a single executor running in the same JVM as the driver. This
mode is useful for testing or running small jobs. The master URL for this mode is
loc-
al
(use one thread),
local[
n
]
(
n
threads), or
local(*)
(one thread per core on the
machine).
Standalone
The standalone cluster manager is a simple distributed implementation that runs a single
Spark master and one or more workers. When a Spark application starts, the master will
ask the workers to spawn executor processes on behalf of the application. The master
URL is
spark://
host
:
port
.
Mesos
Apache Mesos is a general-purpose cluster resource manager that allows fine-grained
sharing of resources across different applications according to an organizational policy.
By default (fine-grained mode), each Spark task is run as a Mesos task. This uses the
cluster resources more efficiently, but at the cost of additional process launch overhead.
In coarse-grained mode, executors run their tasks in-process, so the cluster resources are
held by the executor processes for the duration of the Spark application. The master
URL is
mesos://
host
:
port
.
YARN
plication corresponds to an instance of a YARN application, and each executor runs in
its own YARN container. The master URL is
yarn-client
or
yarn-cluster
.
The Mesos and YARN cluster managers are superior to the standalone manager since they
take into account the resource needs of other applications running on the cluster (MapRe-
duce jobs, for example) and enforce a scheduling policy across all of them. The standalone
cluster manager uses a static allocation of resources from the cluster, and therefore is not
able to adapt to the varying needs of other applications over time. Also, YARN is the only
cluster manager that is integrated with Hadoop's Kerberos security mechanisms (see
Secur-