MapReduce Types and Formats - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

they can store arbitrary types using a variety of serialization frameworks. (These topics

are covered in SequenceFile . )

To use data from sequence files as the input to MapReduce, you can use

SequenceFileInputFormat . The keys and values are determined by the sequence

file, and you need to make sure that your map input types correspond. For example, if

your sequence file has IntWritable keys and Text values, like the one created in

Chapter 5 , then the map signature would be Mapper<IntWritable, Text, K,

V> , where K and V are the types of the map's output keys and values.

NOTE

Although its name doesn't give it away, SequenceFileInputFormat can read map files as well as

sequence files. If it finds a directory where it was expecting a sequence file, SequenceFileIn-

putFormat assumes that it is reading a map file and uses its datafile. This is why there is no

MapFileInputFormat class.

SequenceFileAsTextInputFormat

SequenceFileAsTextInputFormat is a variant of SequenceFileIn-

putFormat that converts the sequence file's keys and values to Text objects. The con-

version is performed by calling toString() on the keys and values. This format makes

sequence files suitable input for Streaming.

SequenceFileAsBinaryInputFormat

SequenceFileAsBinaryInputFormat is a variant of SequenceFileIn-

putFormat that retrieves the sequence file's keys and values as opaque binary objects.

They are encapsulated as BytesWritable objects, and the application is free to inter-

pret the underlying byte array as it pleases. In combination with a process that creates se-

quence files with SequenceFile.Writer 's appendRaw() method or

SequenceFileAsBinaryOutputFormat , this provides a way to use any binary

data types with MapReduce (packaged as a sequence file), although plugging into Ha-

doop's serialization mechanism is normally a cleaner alternative (see Serialization Frame-

works ).

Search WWH ::

Custom Search

Home