Database Reference
In-Depth Information
they can store arbitrary types using a variety of serialization frameworks. (These topics
are covered in
SequenceFile
.
)
To use data from sequence files as the input to MapReduce, you can use
SequenceFileInputFormat
. The keys and values are determined by the sequence
file, and you need to make sure that your map input types correspond. For example, if
your sequence file has
IntWritable
keys and
Text
values, like the one created in
V>
, where
K
and
V
are the types of the map's output keys and values.
NOTE
Although its name doesn't give it away,
SequenceFileInputFormat
can read map files as well as
sequence files. If it finds a directory where it was expecting a sequence file,
SequenceFileIn-
putFormat
assumes that it is reading a map file and uses its datafile. This is why there is no
MapFileInputFormat
class.
SequenceFileAsTextInputFormat
SequenceFileAsTextInputFormat
is a variant of
SequenceFileIn-
putFormat
that converts the sequence file's keys and values to
Text
objects. The con-
version is performed by calling
toString()
on the keys and values. This format makes
sequence files suitable input for Streaming.
SequenceFileAsBinaryInputFormat
SequenceFileAsBinaryInputFormat
is a variant of
SequenceFileIn-
putFormat
that retrieves the sequence file's keys and values as opaque binary objects.
They are encapsulated as
BytesWritable
objects, and the application is free to inter-
pret the underlying byte array as it pleases. In combination with a process that creates se-
quence files with
SequenceFile.Writer
's
appendRaw()
method or
SequenceFileAsBinaryOutputFormat
, this provides a way to use any binary
data types with MapReduce (packaged as a sequence file), although plugging into Ha-
doop's serialization mechanism is normally a cleaner alternative (see
Serialization Frame-
works
).