Database Reference
In-Depth Information
File-Based Data Structures
For some applications, you need a specialized data structure to hold your data. For doing
MapReduce-based processing, putting each blob of binary data into its own file doesn't
scale, so Hadoop developed a number of higher-level containers for these situations.
SequenceFile
Imagine a logfile where each log record is a new line of text. If you want to log binary
types, plain text isn't a suitable format. Hadoop's SequenceFile class fits the bill in this
situation, providing a persistent data structure for binary key-value pairs. To use it as a log-
file format, you would choose a key, such as timestamp represented by a LongWritable ,
and the value would be a Writable that represents the quantity being logged.
SequenceFile s also work well as containers for smaller files. HDFS and MapReduce
are optimized for large files, so packing files into a SequenceFile makes storing and
processing the smaller files more efficient ( Processing a whole file as a record contains a
program to pack files into a SequenceFile ). [ 47 ]
Writing a SequenceFile
To create a SequenceFile , use one of its createWriter() static methods, which re-
turn a SequenceFile.Writer instance. There are several overloaded versions, but
they all require you to specify a stream to write to (either an FSDataOutputStream or
a FileSystem and Path pairing), a Configuration object, and the key and value
types. Optional arguments include the compression type and codec, a Progressable
callback to be informed of write progress, and a Metadata instance to be stored in the
SequenceFile header.
The keys and values stored in a SequenceFile do not necessarily need to be Writ-
able s. Any types that can be serialized and deserialized by a Serialization may be
used.
Once you have a SequenceFile.Writer , you then write key-value pairs using the
append() method. When you've finished, you call the close() method
( SequenceFile.Writer implements java.io.Closeable ).
Example 5-10 shows a short program to write some key-value pairs to a SequenceFile
using the API just described.
Example 5-10. Writing a SequenceFile
Search WWH ::




Custom Search