Database Reference
In-Depth Information
job
.
setOutputKeyClass
(
IntWritable
.
class
);
job
.
setOutputFormatClass
(
SequenceFileOutputFormat
.
class
);
SequenceFileOutputFormat
.
setCompressOutput
(
job
,
true
);
SequenceFileOutputFormat
.
setOutputCompressorClass
(
job
,
GzipCodec
.
class
);
SequenceFileOutputFormat
.
setOutputCompressionType
(
job
,
CompressionType
.
BLOCK
);
return
job
.
waitForCompletion
(
true
) ?
0
:
1
;
}
public static
void
main
(
String
[]
args
)
throws
Exception
{
int
exitCode
=
ToolRunner
.
run
(
new
SortByTemperatureUsingHashPartitioner
(),
args
);
System
.
exit
(
exitCode
);
}
}
CONTROLLING SORT ORDER
The sort order for keys is controlled by a
RawComparator
, which is found as follows:
1. If the property
mapreduce.job.output.key.comparator.class
is set, either ex-
plicitly or by calling
setSortComparatorClass()
on
Job
, then an instance of that class
is used. (In the old API, the equivalent method is
setOutputKeyComparatorClass()
on
JobConf
.)
2. Otherwise, keys must be a subclass of
WritableComparable
, and the registered compar-
ator for the key class is used.
3. If there is no registered comparator, then a
RawComparator
is used. The
RawComparator
deserializes the byte streams being compared into objects and delegates to the
Writ-
ableComparable
's
compareTo()
method.
These rules reinforce the importance of registering optimized versions of
RawComparator
s for your
also show that it's straightforward to override the sort order by setting your own comparator (we do this
in
Secondary Sort
).
Suppose we run this program using 30 reducers:
[
62
]
%
hadoop jar hadoop-examples.jar
SortByTemperatureUsingHashPartitioner \
-D mapreduce.job.reduces=30 input/ncdc/all-seq output-hashsort