MapReduce Types and Formats - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

if ( job == null ) {

return - 1 ;

}

job . setInputFormatClass ( WholeFileInputFormat . class );

job . setOutputFormatClass ( SequenceFileOutputFormat . class );

job . setOutputKeyClass ( Text . class );

job . setOutputValueClass ( BytesWritable . class );

job . setMapperClass ( SequenceFileMapper . class );

return job . waitForCompletion ( true ) ? 0 : 1 ;

}

public static void main ( String [] args ) throws Exception {

int exitCode = ToolRunner . run ( new

SmallFilesToSequenceFileConverter (), args );

System . exit ( exitCode );

}

Because the input format is a WholeFileInputFormat , the mapper only has to find

the filename for the input file split. It does this by casting the InputSplit from the

context to a FileSplit , which has a method to retrieve the file path. The path is stored

in a Text object for the key. The reducer is the identity (not explicitly set), and the output

format is a SequenceFileOutputFormat .

Here's a run on a few small files. We've chosen to use two reducers, so we get two output

sequence files:

% hadoop jar hadoop-examples.jar SmallFilesToSequenceFileConverter \

-conf conf/hadoop-localhost.xml -D mapreduce.job.reduces=2 \

input/smallfiles output

Two part files are created, each of which is a sequence file. We can inspect these with the

-text option to the filesystem shell:

% hadoop fs -conf conf/hadoop-localhost.xml -text output/part-r-00000

hdfs://localhost/user/tom/input/smallfiles/a 61 61 61 61 61

61 61 61 61 61

hdfs://localhost/user/tom/input/smallfiles/c 63 63 63 63 63

63 63 63 63 63

hdfs://localhost/user/tom/input/smallfiles/e

% hadoop fs -conf conf/hadoop-localhost.xml -text output/part-r-00001

hdfs://localhost/user/tom/input/smallfiles/b 62 62 62 62 62

Search WWH ::

Custom Search

Home