Database Reference
In-Depth Information
Compressing map output
Even if your MapReduce application reads and writes uncompressed data, it may benefit
from compressing the intermediate output of the map phase. The map output is written to
disk and transferred across the network to the reducer nodes, so by using a fast com-
pressor such as LZO, LZ4, or Snappy, you can get performance gains simply because the
volume of data to transfer is reduced. The configuration properties to enable compression
for map outputs and to set the compression format are shown in Table 5-6 .
Table 5-6. Map output compression properties
Property name
Type
Default value
mapreduce.map.output.compress
boolean false
mapreduce.map.output.compress.codec Class
org.apache.hadoop.io.compress.DefaultCodec The com-
Here are the lines to add to enable gzip map output compression in your job (using the
new API):
Configuration conf = new Configuration ();
conf . setBoolean ( Job . MAP_OUTPUT_COMPRESS , true );
conf . setClass ( Job . MAP_OUTPUT_COMPRESS_CODEC , GzipCodec . class ,
CompressionCodec . class );
Job job = new Job ( conf );
In the old API (see Appendix D ), there are convenience methods on the JobConf object
for doing the same thing:
conf . setCompressMapOutput ( true );
conf . setMapOutputCompressorClass ( GzipCodec . class );
Search WWH ::




Custom Search