Database Reference
In-Depth Information
Figure 19-4. How Spark executors are started in YARN cluster mode
In both YARN modes, the executors are launched before there is any data locality inform-
ation available, so it could be that they end up not being co-located on the datanodes host-
ing the files that the jobs access. For interactive sessions, this may be acceptable, particu-
larly as it may not be known which datasets are going to be accessed before the session
starts. This is less true of production jobs, however, so Spark provides a way to give
placement hints to improve data locality when running in YARN cluster mode.
The SparkContext constructor can take a second argument of preferred locations,
computed from the input format and path using the InputFormatInfo helper class.
For example, for text files, we use TextInputFormat :
val preferredLocations = InputFormatInfo . computePreferredLocations (
Seq ( new InputFormatInfo ( new Configuration (),
classOf [ TextInputFormat ],
Search WWH ::




Custom Search