Database Reference
In-Depth Information
manager to launch executors and, in certain cases, to launch the driver. The cluster
manager is a pluggable component in Spark. This allows Spark to run on top of
different external managers, such as YARN and Mesos, as well as its built-in Stand‐
alone cluster manager.
Spark's documentation consistently uses the terms driver and exec‐
utor when describing the processes that execute each Spark appli‐
cation. The terms master and worker are used to describe the
centralized and distributed portions of the cluster manager. It's
easy to confuse these terms, so pay close attention. For instance,
Hadoop YARN runs a master daemon (called the Resource Man‐
ager) and several worker daemons called Node Managers. Spark
can run both drivers and executors on the YARN worker nodes.
Launching a Program
No matter which cluster manager you use, Spark provides a single script you can use
to submit your program to it called spark-submit . Through various options, spark-
submit can connect to different cluster managers and control how many resources
your application gets. For some cluster managers, spark-submit can run the driver
within the cluster (e.g., on a YARN worker node), while for others, it can run it only
on your local machine. We'll cover spark-submit in more detail in the next section.
Summary
To summarize the concepts in this section, let's walk through the exact steps that
occur when you run a Spark application on a cluster:
1. The user submits an application using spark-submit .
2. spark-submit launches the driver program and invokes the main() method
specified by the user.
3. The driver program contacts the cluster manager to ask for resources to launch
executors.
4. The cluster manager launches executors on behalf of the driver program.
5. The driver process runs through the user application. Based on the RDD actions
and transformations in the program, the driver sends work to executors in the
form of tasks.
6. Tasks are run on executor processes to compute and save results.
7. If the driver's main() method exits or it calls SparkContext.stop() , it will termi‐
nate the executors and release resources from the cluster manager.
Search WWH ::




Custom Search