Spark - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Figure 19-4. How Spark executors are started in YARN cluster mode

In both YARN modes, the executors are launched before there is any data locality inform-

ation available, so it could be that they end up not being co-located on the datanodes host-

ing the files that the jobs access. For interactive sessions, this may be acceptable, particu-

larly as it may not be known which datasets are going to be accessed before the session

starts. This is less true of production jobs, however, so Spark provides a way to give

placement hints to improve data locality when running in YARN cluster mode.

The SparkContext constructor can take a second argument of preferred locations,

computed from the input format and path using the InputFormatInfo helper class.

For example, for text files, we use TextInputFormat :

val preferredLocations = InputFormatInfo . computePreferredLocations (

Seq ( new InputFormatInfo ( new Configuration (),

classOf [ TextInputFormat ],

Search WWH ::

Custom Search

Home