Database Reference
In-Depth Information
Each part of the final output is compressed; in this case, there is a single part:
%
gunzip -c output/part-r-00000.gz
1949 111
1950 22
If you are emitting sequence files for your output, you can set the
mapre-
duce.output.fileoutputformat.compress.type
property to control the
type of compression to use. The default is
RECORD
, which compresses individual records.
Changing this to
BLOCK
, which compresses groups of records, is recommended because it
compresses better (see
The SequenceFile format
)
.
There is also a static convenience method on
SequenceFileOutputFormat
called
setOutputCompressionType()
to set this property.
The configuration properties to set compression for MapReduce job outputs are summar-
icOptionsParser, Tool, and ToolRunner
), you can pass any of these properties to the pro-
gram on the command line, which may be more convenient than modifying your program
to hardcode the compression properties.
Table 5-5. MapReduce compression properties
Property name
Type
Default value
mapreduce.output.fileoutputformat.compress
boolean false
mapreduce.output.fileoutputformat.compress.codec Class
name
org.apache.hadoop.io.compress.DefaultCodec
The com-
mapreduce.output.fileoutputformat.compress.type String RECORD