Database Reference
In-Depth Information
Option(s)
Default
Explanation
Class to use for serializing objects that will be sent
over the network or need to be cached in serialized
form. The default of Java Serialization works with
any serializable Java object but is quite slow, so we
recommend using org.apache.spark.seri
alizer.KryoSerializer and configuring
Kryo serialization when speed is necessary. Can be
any subclass of org.apache.spark.Serial
izer .
spark.serializer
org.apache.spark.seri
alizer.JavaSerializer
(random)
Allows setting integer port values to be used by a
running Spark applications. This is useful in clusters
where network access is secured. The possible
values of X are driver , fileserver , broad
cast , replClassServer , blockManager ,
and executor .
spark.[X].port
Set to true to enable event logging, which allows
completed Spark jobs to be viewed using a history
server. For more information about Spark's history
server, see the official documentation.
spark.eventLog.enabled
false
The storage location used for event logging, if
enabled. This needs to be in a globally visible
filesystem such as HDFS.
spark.eventLog.dir
file:///tmp/spark-
events
Almost all Spark configurations occur through the SparkConf construct, but one
important option doesn't. To set the local storage directories for Spark to use for
shuffle data (necessary for standalone and Mesos modes), you export the
SPARK_LOCAL_DIRS environment variable inside of conf/spark-env.sh to a comma-
separated list of storage locations. SPARK_LOCAL_DIRS is described in detail in “Hard‐
ware Provisioning” on page 158 . This is specified differently from other Spark
configurations because its value may be different on different physical hosts.
Components of Execution: Jobs, Tasks, and Stages
A first step in tuning and debugging Spark is to have a deeper understanding of the
system's internal design. In previous chapters you saw the “logical” representation of
RDDs and their partitions. When executing, Spark translates this logical representa‐
tion into a physical execution plan by merging multiple operations into tasks. Under‐
standing every aspect of Spark's execution is beyond the scope of this topic, but an
appreciation for the steps involved along with the relevant terminology can be helpful
when tuning and debugging jobs.
 
Search WWH ::




Custom Search