Tuning and Debugging Spark - Learning Spark

Database Reference

In-Depth Information

you have an especially expensive stage, you can click through and better understand

what user code the stage is associated with.

Once you've narrowed down a stage of interest, the stage page, shown in Figure 8-3 ,

can help isolate performance issues. In data-parallel systems such as Spark, a com‐

mon source of performance issues is skew , which occurs when a small number of

tasks take a very large amount of time compared to others. The stage page can help

you identify skew by looking at the distribution of different metrics over all tasks. A

good starting point is the runtime of the task; do a few tasks take much more time

than others? If this is the case, you can dig deeper and see what is causing the tasks to

be slow. Do a small number of tasks read or write much more data than others? Are

tasks running on certain nodes very slow? These are useful first steps when you're

debugging a job.

Figure 8-3. The Spark application UI's stage detail page

In addition to looking at task skew, it can be helpful to identify how much time tasks

are spending in each of the phases of the task lifecycle: reading, computing, and writ‐

ing. If tasks spend very little time reading or writing data, but take a long time overall,

it might be the case that the user code itself is expensive (for an example of user code

Search WWH ::

Custom Search

Home