Database Reference
In-Depth Information
type. This also causes no separator to be written, which makes the output suitable for
reading in using TextInputFormat .
Binary Output
SequenceFileOutputFormat
As the name indicates, SequenceFileOutputFormat writes sequence files for its
output. This is a good choice of output if it forms the input to a further MapReduce job,
since it is compact and is readily compressed. Compression is controlled via the static
methods on SequenceFileOutputFormat , as described in Using Compression in
MapReduce . For an example of how to use SequenceFileOutputFormat , see Sort-
ing .
SequenceFileAsBinaryOutputFormat
SequenceFileAsBinaryOutputFormat — the counterpart to
SequenceFileAsBinaryInputFormat — writes keys and values in raw binary
format into a sequence file container.
MapFileOutputFormat
MapFileOutputFormat writes map files as output. The keys in a MapFile must be
added in order, so you need to ensure that your reducers emit keys in sorted order.
NOTE
The reduce input keys are guaranteed to be sorted, but the output keys are under the control of the reduce
function, and there is nothing in the general MapReduce contract that states that the reduce output keys
have to be ordered in any way. The extra constraint of sorted reduce output keys is just needed for
MapFileOutputFormat .
Multiple Outputs
FileOutputFormat and its subclasses generate a set of files in the output directory.
There is one file per reducer, and files are named by the partition number: part-r-00000 ,
part-r-00001 , and so on. Sometimes there is a need to have more control over the naming
of the files or to produce multiple files per reducer. MapReduce comes with the Mul-
tipleOutputs class to help you do this. [ 60 ]
Search WWH ::




Custom Search