Database Reference
In-Depth Information
if ( job == null ) {
return - 1 ;
}
job . setInputFormatClass ( WholeFileInputFormat . class );
job . setOutputFormatClass ( SequenceFileOutputFormat . class );
job . setOutputKeyClass ( Text . class );
job . setOutputValueClass ( BytesWritable . class );
job . setMapperClass ( SequenceFileMapper . class );
return job . waitForCompletion ( true ) ? 0 : 1 ;
}
public static void main ( String [] args ) throws Exception {
int exitCode = ToolRunner . run ( new
SmallFilesToSequenceFileConverter (), args );
System . exit ( exitCode );
}
}
Because the input format is a WholeFileInputFormat , the mapper only has to find
the filename for the input file split. It does this by casting the InputSplit from the
context to a FileSplit , which has a method to retrieve the file path. The path is stored
in a Text object for the key. The reducer is the identity (not explicitly set), and the output
format is a SequenceFileOutputFormat .
Here's a run on a few small files. We've chosen to use two reducers, so we get two output
sequence files:
% hadoop jar hadoop-examples.jar SmallFilesToSequenceFileConverter \
-conf conf/hadoop-localhost.xml -D mapreduce.job.reduces=2 \
input/smallfiles output
Two part files are created, each of which is a sequence file. We can inspect these with the
-text option to the filesystem shell:
% hadoop fs -conf conf/hadoop-localhost.xml -text output/part-r-00000
hdfs://localhost/user/tom/input/smallfiles/a 61 61 61 61 61
61 61 61 61 61
hdfs://localhost/user/tom/input/smallfiles/c 63 63 63 63 63
63 63 63 63 63
hdfs://localhost/user/tom/input/smallfiles/e
% hadoop fs -conf conf/hadoop-localhost.xml -text output/part-r-00001
hdfs://localhost/user/tom/input/smallfiles/b 62 62 62 62 62
Search WWH ::




Custom Search