Database Reference
In-Depth Information
val csvWriter = new CSVWriter ( stringWriter );
csvWriter . writeAll ( people . toList )
Iterator ( stringWriter . toString )
}. saveAsTextFile ( outFile )
As you may have noticed, the preceding examples work only provided that we know
all of the fields that we will be outputting. However, if some of the field names are
determined at runtime from user input, we need to take a different approach. The
simplest approach is going over all of our data and extracting the distinct keys and
then taking another pass for output.
SequenceFiles
SequenceFiles are a popular Hadoop format composed of flat files with key/value
pairs. SequenceFiles have sync markers that allow Spark to seek to a point in the file
and then resynchronize with the record boundaries. This allows Spark to efficiently
read SequenceFiles in parallel from multiple nodes. SequenceFiles are a common
input/output format for Hadoop MapReduce jobs as well, so if you are working with
an existing Hadoop system there is a good chance your data will be available as a
SequenceFile.
SequenceFiles consist of elements that implement Hadoop's Writable interface, as
Hadoop uses a custom serialization framework. Table 5-2 lists some common types
and their corresponding Writable class. The standard rule of thumb is to try adding
the word Writable to the end of your class name and see if it is a known subclass of
org.apache.hadoop.io.Writable . If you can't find a Writable for the data you are
trying to write out (for example, a custom case class), you can go ahead and imple‐
ment your own Writable class by overriding readFields and write from
org.apache.hadoop.io.Writable .
Hadoop's RecordReader reuses the same object for each record, so
directly calling cache on an RDD you read in like this can fail;
instead, add a simple map() operation and cache its result. Further‐
more, many Hadoop Writable classes do not implement
java.io.Serializable , so for them to work in RDDs we need to
convert them with a map() anyway.
Search WWH ::




Custom Search