MapReduce Features - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

job . setOutputKeyClass ( IntWritable . class );

job . setOutputFormatClass ( SequenceFileOutputFormat . class );

SequenceFileOutputFormat . setCompressOutput ( job , true );

SequenceFileOutputFormat . setOutputCompressorClass ( job ,

GzipCodec . class );

SequenceFileOutputFormat . setOutputCompressionType ( job ,

CompressionType . BLOCK );

return job . waitForCompletion ( true ) ? 0 : 1 ;

}

public static void main ( String [] args ) throws Exception {

int exitCode = ToolRunner . run ( new

SortByTemperatureUsingHashPartitioner (),

args );

System . exit ( exitCode );

}

CONTROLLING SORT ORDER

The sort order for keys is controlled by a RawComparator , which is found as follows:

1. If the property mapreduce.job.output.key.comparator.class is set, either ex-

plicitly or by calling setSortComparatorClass() on Job , then an instance of that class

is used. (In the old API, the equivalent method is setOutputKeyComparatorClass()

on JobConf .)

2. Otherwise, keys must be a subclass of WritableComparable , and the registered compar-

ator for the key class is used.

3. If there is no registered comparator, then a RawComparator is used. The RawComparator

deserializes the byte streams being compared into objects and delegates to the Writ-

ableComparable 's compareTo() method.

These rules reinforce the importance of registering optimized versions of RawComparator s for your

own custom Writable classes (which is covered in Implementing a RawComparator for speed ), and

also show that it's straightforward to override the sort order by setting your own comparator (we do this

in Secondary Sort ).

Suppose we run this program using 30 reducers: [ 62 ]

% hadoop jar hadoop-examples.jar

SortByTemperatureUsingHashPartitioner \

-D mapreduce.job.reduces=30 input/ncdc/all-seq output-hashsort

Search WWH ::

Custom Search

Home