MapReduce Features - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

@Override

public int run ( String [] args ) throws Exception {

Job job = JobBuilder . parseInputAndOutput ( this , getConf (), args );

if ( job == null ) {

return - 1 ;

}

job . setMapperClass ( CleanerMapper . class );

job . setOutputKeyClass ( IntWritable . class );

job . setOutputValueClass ( Text . class );

job . setNumReduceTasks ( 0 );

job . setOutputFormatClass ( SequenceFileOutputFormat . class );

SequenceFileOutputFormat . setCompressOutput ( job , true );

SequenceFileOutputFormat . setOutputCompressorClass ( job ,

GzipCodec . class );

SequenceFileOutputFormat . setOutputCompressionType ( job ,

CompressionType . BLOCK );

return job . waitForCompletion ( true ) ? 0 : 1 ;

}

public static void main ( String [] args ) throws Exception {

int exitCode = ToolRunner . run ( new SortDataPreprocessor (), args );

System . exit ( exitCode );

}

Partial Sort

In The Default MapReduce Job , we saw that, by default, MapReduce will sort input re-

cords by their keys. Example 9-4 is a variation for sorting sequence files with IntWrit-

able keys.

Example 9-4. A MapReduce program for sorting a SequenceFile with IntWritable keys us-

ing the default HashPartitioner

public class SortByTemperatureUsingHashPartitioner extends Configured

implements Tool {

@Override

public int run ( String [] args ) throws Exception {

Job job = JobBuilder . parseInputAndOutput ( this , getConf (), args );

if ( job == null ) {

return - 1 ;

}

job . setInputFormatClass ( SequenceFileInputFormat . class );

Search WWH ::

Custom Search

Home