Database Reference
In-Depth Information
In order to compress the output of a MapReduce job, in the job configuration, set the
mapreduce.output.fileoutputformat.compress property to true and set
the mapreduce.output.fileoutputformat.compress.codec property to
the classname of the compression codec you want to use. Alternatively, you can use the
static convenience methods on FileOutputFormat to set these properties, as shown
in Example 5-4 .
Example 5-4. Application to run the maximum temperature job producing compressed out-
put
public class MaxTemperatureWithCompression {
public static void main ( String [] args ) throws Exception {
if ( args . length != 2 ) {
System . err . println ( "Usage: MaxTemperatureWithCompression <input
path> " +
"<output path>" );
System . exit (- 1 );
}
Job job = new Job ();
job . setJarByClass ( MaxTemperature . class );
FileInputFormat . addInputPath ( job , new Path ( args [ 0 ]));
FileOutputFormat . setOutputPath ( job , new Path ( args [ 1 ]));
job . setOutputKeyClass ( Text . class );
job . setOutputValueClass ( IntWritable . class );
FileOutputFormat . setCompressOutput ( job , true );
FileOutputFormat . setOutputCompressorClass ( job , GzipCodec . class );
job . setMapperClass ( MaxTemperatureMapper . class );
job . setCombinerClass ( MaxTemperatureReducer . class );
job . setReducerClass ( MaxTemperatureReducer . class );
System . exit ( job . waitForCompletion ( true ) ? 0 : 1 );
}
}
We run the program over compressed input (which doesn't have to use the same compres-
sion format as the output, although it does in this example) as follows:
% hadoop MaxTemperatureWithCompression input/ncdc/sample.txt.gz
output
Search WWH ::




Custom Search