Database Reference
In-Depth Information
In order to compress the output of a MapReduce job, in the job configuration, set the
mapreduce.output.fileoutputformat.compress
property to
true
and set
the
mapreduce.output.fileoutputformat.compress.codec
property to
the classname of the compression codec you want to use. Alternatively, you can use the
static convenience methods on
FileOutputFormat
to set these properties, as shown
in
Example 5-4
.
Example 5-4. Application to run the maximum temperature job producing compressed out-
put
public class
MaxTemperatureWithCompression
{
public static
void
main
(
String
[]
args
)
throws
Exception
{
if
(
args
.
length
!=
2
) {
System
.
err
.
println
(
"Usage: MaxTemperatureWithCompression <input
path> "
+
"<output path>"
);
System
.
exit
(-
1
);
}
Job job
=
new
Job
();
job
.
setJarByClass
(
MaxTemperature
.
class
);
FileInputFormat
.
addInputPath
(
job
,
new
Path
(
args
[
0
]));
FileOutputFormat
.
setOutputPath
(
job
,
new
Path
(
args
[
1
]));
job
.
setOutputKeyClass
(
Text
.
class
);
job
.
setOutputValueClass
(
IntWritable
.
class
);
FileOutputFormat
.
setCompressOutput
(
job
,
true
);
FileOutputFormat
.
setOutputCompressorClass
(
job
,
GzipCodec
.
class
);
job
.
setMapperClass
(
MaxTemperatureMapper
.
class
);
job
.
setCombinerClass
(
MaxTemperatureReducer
.
class
);
job
.
setReducerClass
(
MaxTemperatureReducer
.
class
);
System
.
exit
(
job
.
waitForCompletion
(
true
) ?
0
:
1
);
}
}
We run the program over compressed input (which doesn't have to use the same compres-
sion format as the output, although it does in this example) as follows:
%
hadoop MaxTemperatureWithCompression input/ncdc/sample.txt.gz
output