Database Reference
In-Depth Information
Serialization
There are two aspects of serialization to consider in Spark: serialization of data and serial-
ization of functions (or closures).
Data
Let's look at data serialization first. By default, Spark will use Java serialization to send
data over the network from one executor to another, or when caching (persisting) data in
serialized form as described in Persistence levels . Java serialization is well understood by
programmers (you make sure the class you are using implements
java.io.Serializable or java.io.Externalizable ), but it is not particu-
larly efficient from a performance or size perspective.
A better choice for most Spark programs is Kryo serialization . Kryo is a more efficient
general-purpose serialization library for Java. In order to use Kryo serialization, set the
spark.serializer as follows on the SparkConf in your driver program:
conf . set ( "spark.serializer" ,
"org.apache.spark.serializer.KryoSerializer" )
Kryo does not require that a class implement a particular interface (like
java.io.Serializable ) to be serialized, so plain old Java objects can be used in
RDDs without any further work beyond enabling Kryo serialization. Having said that, it is
much more efficient to register classes with Kryo before using them. This is because Kryo
writes a reference to the class of the object being serialized (one reference is written for
every object written), which is just an integer identifier if the class has been registered but
is the full classname otherwise. This guidance only applies to your own classes; Spark re-
gisters Scala classes and many other framework classes (like Avro Generic or Thrift
classes) on your behalf.
Registering classes with Kryo is straightforward. Create a subclass of KryoRegis-
trator , and override the registerClasses() method:
class CustomKryoRegistrator extends KryoRegistrator {
override def registerClasses ( kryo : Kryo ) {
kryo . register ( classOf [ WeatherRecord ])
}
}
Finally, in your driver program, set the spark.kryo.registrator property to the
fully qualified classname of your KryoRegistrator implementation:
conf . set ( "spark.kryo.registrator" , "CustomKryoRegistrator" )
Search WWH ::




Custom Search