Advanced MapReduce - Hadoop in Action

Databases Reference

In-Depth Information

In this setup, you should think of Map2 and Reduce as the core of the MapReduce job,

with the standard partitioning and shuffling applied between the mapper and reducer.

You should consider Map1 as a preprocessing step and Map3 and Map4 as postprocess-

ing steps. The number of processing steps can vary. This is only an example.

You can specify the composition of this sequence of mappers and reducer with the

driver. See listing 5.1. You need to make sure the key and value outputs of one task

have matching types (classes) with the inputs of the next task.

Listing 5.1 Driver for chaining

mappers within a MapReduce job

Configuration conf = getConf();

JobConf job = new JobConf(conf);

job.setJobName("ChainJob");

job.setInputFormat(TextInputFormat.class);

job.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(job, in);

FileOutputFormat.setOutputPath(job, out);

JobConf map1Conf = new JobConf(false);

ChainMapper.addMapper(job,

Map1.class,

LongWritable.class,

Text.class,

Add Map1 step to job

Text.class,

true,

map1Conf);

JobConf map2Conf = new JobConf(false);

ChainMapper.addMapper(job,

Map2.class,

Text.class,

Add Map2 step to job

Text.class,

LongWritable.class,

Text.class,

true,

map2Conf);

JobConf reduceConf = new JobConf(false);

ChainReducer.setReducer(job,

Reduce.class,

LongWritable.class,

Text.class,

Add Reduce step to job

Text.class,

true,

reduceConf);

JobConf map3Conf = new JobConf(false);

ChainReducer.addMapper(job,

Map3.class,

Add Map3 step to job

Text.class,

LongWritable.class,

Hadoop in Action

Search WWH ::

Custom Search

Home