Database Reference
In-Depth Information
Most logging and instrumentation in Spark is expressed in terms of stages, tasks, and
shuffles. Understanding how user code compiles down into the bits of physical exe‐
cution is an advanced concept, but one that will help you immensely in tuning and
debugging applications.
To summarize, the following phases occur during Spark execution:
User code defines a DAG (directed acyclic graph) of RDDs
Operations on RDDs create new RDDs that refer back to their parents, thereby
creating a graph.
Actions force translation of the DAG to an execution plan
When you call an action on an RDD it must be computed. This requires comput‐
ing its parent RDDs as well. Spark's scheduler submits a job to compute all
needed RDDs. That job will have one or more stages , which are parallel waves of
computation composed of tasks . Each stage will correspond to one or more
RDDs in the DAG. A single stage can correspond to multiple RDDs due to
pipelining .
Tasks are scheduled and executed on a cluster
Stages are processed in order, with individual tasks launching to compute seg‐
ments of the RDD. Once the final stage is finished in a job, the action is com‐
plete.
In a given Spark application, this entire sequence of steps may occur many times in a
continuous fashion as new RDDs are created.
Finding Information
Spark records detailed progress information and performance metrics as applications
execute. These are presented to the user in two places: the Spark web UI and the log‐
files produced by the driver and executor processes.
Spark Web UI
The first stop for learning about the behavior and performance of a Spark application
is Spark's built-in web UI. This is available on the machine where the driver is run‐
ning at port 4040 by default. One caveat is that in the case of the YARN cluster mode,
where the application driver runs inside the cluster, you should access the UI through
the YARN ResourceManager, which proxies requests directly to the driver.
The Spark UI contains several different pages, and the exact format may differ across
Spark versions. As of Spark 1.2, the UI is composed of four different sections, which
we'll cover next.
Search WWH ::




Custom Search