Database Reference
In-Depth Information
they can store arbitrary types using a variety of serialization frameworks. (These topics
are covered in SequenceFile . )
To use data from sequence files as the input to MapReduce, you can use
SequenceFileInputFormat . The keys and values are determined by the sequence
file, and you need to make sure that your map input types correspond. For example, if
your sequence file has IntWritable keys and Text values, like the one created in
Chapter 5 , then the map signature would be Mapper<IntWritable, Text, K,
V> , where K and V are the types of the map's output keys and values.
NOTE
Although its name doesn't give it away, SequenceFileInputFormat can read map files as well as
sequence files. If it finds a directory where it was expecting a sequence file, SequenceFileIn-
putFormat assumes that it is reading a map file and uses its datafile. This is why there is no
MapFileInputFormat class.
SequenceFileAsTextInputFormat
SequenceFileAsTextInputFormat is a variant of SequenceFileIn-
putFormat that converts the sequence file's keys and values to Text objects. The con-
version is performed by calling toString() on the keys and values. This format makes
sequence files suitable input for Streaming.
SequenceFileAsBinaryInputFormat
SequenceFileAsBinaryInputFormat is a variant of SequenceFileIn-
putFormat that retrieves the sequence file's keys and values as opaque binary objects.
They are encapsulated as BytesWritable objects, and the application is free to inter-
pret the underlying byte array as it pleases. In combination with a process that creates se-
quence files with SequenceFile.Writer 's appendRaw() method or
SequenceFileAsBinaryOutputFormat , this provides a way to use any binary
data types with MapReduce (packaged as a sequence file), although plugging into Ha-
doop's serialization mechanism is normally a cleaner alternative (see Serialization Frame-
works ).
Search WWH ::




Custom Search