Hadoop I/O - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

In order to compress the output of a MapReduce job, in the job configuration, set the

mapreduce.output.fileoutputformat.compress property to true and set

the mapreduce.output.fileoutputformat.compress.codec property to

the classname of the compression codec you want to use. Alternatively, you can use the

static convenience methods on FileOutputFormat to set these properties, as shown

in Example 5-4 .

Example 5-4. Application to run the maximum temperature job producing compressed out-

put

public class MaxTemperatureWithCompression {

public static void main ( String [] args ) throws Exception {

if ( args . length != 2 ) {

System . err . println ( "Usage: MaxTemperatureWithCompression <input

path> " +

"<output path>" );

System . exit (- 1 );

}

Job job = new Job ();

job . setJarByClass ( MaxTemperature . class );

FileInputFormat . addInputPath ( job , new Path ( args [ 0 ]));

FileOutputFormat . setOutputPath ( job , new Path ( args [ 1 ]));

job . setOutputKeyClass ( Text . class );

job . setOutputValueClass ( IntWritable . class );

FileOutputFormat . setCompressOutput ( job , true );

FileOutputFormat . setOutputCompressorClass ( job , GzipCodec . class );

job . setMapperClass ( MaxTemperatureMapper . class );

job . setCombinerClass ( MaxTemperatureReducer . class );

job . setReducerClass ( MaxTemperatureReducer . class );

System . exit ( job . waitForCompletion ( true ) ? 0 : 1 );

}

We run the program over compressed input (which doesn't have to use the same compres-

sion format as the output, although it does in this example) as follows:

% hadoop MaxTemperatureWithCompression input/ncdc/sample.txt.gz

output

Search WWH ::

Custom Search

Home