Database Reference
In-Depth Information
Example 5-23. Saving a SequenceFile in Scala
val data = sc . parallelize ( List (( "Panda" , 3 ), ( "Kay" , 6 ), ( "Snail" , 2 )))
data . saveAsSequenceFile ( outputFile )
In Java saving a SequenceFile is slightly more involved, due to the lack of a saveAsSe
quenceFile() method on the JavaPairRDD . Instead, we use Spark's ability to save to
custom Hadoop formats , and we will show how to save to a SequenceFile in Java in
“Hadoop Input and Output Formats” on page 84 .
Object Files
Object files are a deceptively simple wrapper around SequenceFiles that allows us to
save our RDDs containing just values. Unlike with SequenceFiles, with object files the
values are written out using Java Serialization.
If you change your classes—for example, to add and remove fields
—old object files may no longer be readable. Object files use Java
Serialization, which has some support for managing compatibility
across class versions but requires programmer effort to do so.
Using Java Serialization for object files has a number of implications. Unlike with
normal SequenceFiles, the output will be different than Hadoop outputting the same
objects. Unlike the other formats, object files are mostly intended to be used for
Spark jobs communicating with other Spark jobs. Java Serialization can also be quite
slow.
Saving an object file is as simple as calling saveAsObjectFile on an RDD. Reading
an object file back is also quite simple: the function objectFile() on the SparkCon‐
text takes in a path and returns an RDD.
With all of these warnings about object files, you might wonder why anyone would
use them. The primary reason to use object files is that they require almost no work
to save almost arbitrary objects.
Object files are not available in Python, but the Python RDDs and SparkContext sup‐
port methods called saveAsPickleFile() and pickleFile() instead. These use
Python's pickle serialization library. The same caveats for object files apply to pickle
files, however: the pickle library can be slow, and old files may not be readable if you
change your classes.
Search WWH ::




Custom Search