Database Reference
In-Depth Information
job . setOutputKeyClass ( IntWritable . class );
job . setOutputFormatClass ( SequenceFileOutputFormat . class );
SequenceFileOutputFormat . setCompressOutput ( job , true );
SequenceFileOutputFormat . setOutputCompressorClass ( job ,
GzipCodec . class );
SequenceFileOutputFormat . setOutputCompressionType ( job ,
CompressionType . BLOCK );
return job . waitForCompletion ( true ) ? 0 : 1 ;
}
public static void main ( String [] args ) throws Exception {
int exitCode = ToolRunner . run ( new
SortByTemperatureUsingHashPartitioner (),
args );
System . exit ( exitCode );
}
}
CONTROLLING SORT ORDER
The sort order for keys is controlled by a RawComparator , which is found as follows:
1. If the property mapreduce.job.output.key.comparator.class is set, either ex-
plicitly or by calling setSortComparatorClass() on Job , then an instance of that class
is used. (In the old API, the equivalent method is setOutputKeyComparatorClass()
on JobConf .)
2. Otherwise, keys must be a subclass of WritableComparable , and the registered compar-
ator for the key class is used.
3. If there is no registered comparator, then a RawComparator is used. The RawComparator
deserializes the byte streams being compared into objects and delegates to the Writ-
ableComparable 's compareTo() method.
These rules reinforce the importance of registering optimized versions of RawComparator s for your
own custom Writable classes (which is covered in Implementing a RawComparator for speed ), and
also show that it's straightforward to override the sort order by setting your own comparator (we do this
in Secondary Sort ).
Suppose we run this program using 30 reducers: [ 62 ]
% hadoop jar hadoop-examples.jar
SortByTemperatureUsingHashPartitioner \
-D mapreduce.job.reduces=30 input/ncdc/all-seq output-hashsort
Search WWH ::




Custom Search