Database Reference
In-Depth Information
@Override
public int run ( String [] args ) throws Exception {
Job job = JobBuilder . parseInputAndOutput ( this , getConf (), args );
if ( job == null ) {
return - 1 ;
}
job . setMapperClass ( CleanerMapper . class );
job . setOutputKeyClass ( IntWritable . class );
job . setOutputValueClass ( Text . class );
job . setNumReduceTasks ( 0 );
job . setOutputFormatClass ( SequenceFileOutputFormat . class );
SequenceFileOutputFormat . setCompressOutput ( job , true );
SequenceFileOutputFormat . setOutputCompressorClass ( job ,
GzipCodec . class );
SequenceFileOutputFormat . setOutputCompressionType ( job ,
CompressionType . BLOCK );
return job . waitForCompletion ( true ) ? 0 : 1 ;
}
public static void main ( String [] args ) throws Exception {
int exitCode = ToolRunner . run ( new SortDataPreprocessor (), args );
System . exit ( exitCode );
}
}
Partial Sort
In The Default MapReduce Job , we saw that, by default, MapReduce will sort input re-
cords by their keys. Example 9-4 is a variation for sorting sequence files with IntWrit-
able keys.
Example 9-4. A MapReduce program for sorting a SequenceFile with IntWritable keys us-
ing the default HashPartitioner
public class SortByTemperatureUsingHashPartitioner extends Configured
implements Tool {
@Override
public int run ( String [] args ) throws Exception {
Job job = JobBuilder . parseInputAndOutput ( this , getConf (), args );
if ( job == null ) {
return - 1 ;
}
job . setInputFormatClass ( SequenceFileInputFormat . class );
Search WWH ::




Custom Search