Database Reference
In-Depth Information
Executors and Cluster Managers
We have seen how Spark relies on executors to run the tasks that make up a Spark job, but
we glossed over how the executors actually get started. Managing the lifecycle of executors
is the responsibility of the cluster manager , and Spark provides a variety of cluster man-
agers with different characteristics:
Local
In local mode there is a single executor running in the same JVM as the driver. This
mode is useful for testing or running small jobs. The master URL for this mode is loc-
al (use one thread), local[ n ] ( n threads), or local(*) (one thread per core on the
machine).
Standalone
The standalone cluster manager is a simple distributed implementation that runs a single
Spark master and one or more workers. When a Spark application starts, the master will
ask the workers to spawn executor processes on behalf of the application. The master
URL is spark:// host : port .
Mesos
Apache Mesos is a general-purpose cluster resource manager that allows fine-grained
sharing of resources across different applications according to an organizational policy.
By default (fine-grained mode), each Spark task is run as a Mesos task. This uses the
cluster resources more efficiently, but at the cost of additional process launch overhead.
In coarse-grained mode, executors run their tasks in-process, so the cluster resources are
held by the executor processes for the duration of the Spark application. The master
URL is mesos:// host : port .
YARN
YARN is the resource manager used in Hadoop (see Chapter 4 ) . Each running Spark ap-
plication corresponds to an instance of a YARN application, and each executor runs in
its own YARN container. The master URL is yarn-client or yarn-cluster .
The Mesos and YARN cluster managers are superior to the standalone manager since they
take into account the resource needs of other applications running on the cluster (MapRe-
duce jobs, for example) and enforce a scheduling policy across all of them. The standalone
cluster manager uses a static allocation of resources from the cluster, and therefore is not
able to adapt to the varying needs of other applications over time. Also, YARN is the only
cluster manager that is integrated with Hadoop's Kerberos security mechanisms (see Secur-
ity ) .
Search WWH ::




Custom Search