Database Reference
In-Depth Information
@Override
public
int
run
(
String
[]
args
)
throws
Exception
{
Job job
=
JobBuilder
.
parseInputAndOutput
(
this
,
getConf
(),
args
);
if
(
job
==
null
) {
return
-
1
;
}
job
.
setMapperClass
(
CleanerMapper
.
class
);
job
.
setOutputKeyClass
(
IntWritable
.
class
);
job
.
setOutputValueClass
(
Text
.
class
);
job
.
setNumReduceTasks
(
0
);
job
.
setOutputFormatClass
(
SequenceFileOutputFormat
.
class
);
SequenceFileOutputFormat
.
setCompressOutput
(
job
,
true
);
SequenceFileOutputFormat
.
setOutputCompressorClass
(
job
,
GzipCodec
.
class
);
SequenceFileOutputFormat
.
setOutputCompressionType
(
job
,
CompressionType
.
BLOCK
);
return
job
.
waitForCompletion
(
true
) ?
0
:
1
;
}
public static
void
main
(
String
[]
args
)
throws
Exception
{
int
exitCode
=
ToolRunner
.
run
(
new
SortDataPreprocessor
(),
args
);
System
.
exit
(
exitCode
);
}
}
Partial Sort
In
The Default MapReduce Job
,
we saw that, by default, MapReduce will sort input re-
cords by their keys.
Example 9-4
is a variation for sorting sequence files with
IntWrit-
able
keys.
Example 9-4. A MapReduce program for sorting a SequenceFile with IntWritable keys us-
ing the default HashPartitioner
public class
SortByTemperatureUsingHashPartitioner
extends
Configured
implements
Tool
{
@Override
public
int
run
(
String
[]
args
)
throws
Exception
{
Job job
=
JobBuilder
.
parseInputAndOutput
(
this
,
getConf
(),
args
);
if
(
job
==
null
) {
return
-
1
;
}
job
.
setInputFormatClass
(
SequenceFileInputFormat
.
class
);