Database Reference
In-Depth Information
Each part of the final output is compressed; in this case, there is a single part:
% gunzip -c output/part-r-00000.gz
1949 111
1950 22
If you are emitting sequence files for your output, you can set the mapre-
duce.output.fileoutputformat.compress.type property to control the
type of compression to use. The default is RECORD , which compresses individual records.
Changing this to BLOCK , which compresses groups of records, is recommended because it
compresses better (see The SequenceFile format ) .
There is also a static convenience method on SequenceFileOutputFormat called
setOutputCompressionType() to set this property.
The configuration properties to set compression for MapReduce job outputs are summar-
ized in Table 5-5 . If your MapReduce driver uses the Tool interface (described in Gener-
icOptionsParser, Tool, and ToolRunner ), you can pass any of these properties to the pro-
gram on the command line, which may be more convenient than modifying your program
to hardcode the compression properties.
Table 5-5. MapReduce compression properties
Property name
Type
Default value
mapreduce.output.fileoutputformat.compress
boolean false
mapreduce.output.fileoutputformat.compress.codec Class
name
org.apache.hadoop.io.compress.DefaultCodec The com-
mapreduce.output.fileoutputformat.compress.type String RECORD
Search WWH ::




Custom Search