Databases Reference
In-Depth Information
In this setup, you should think of Map2 and Reduce as the core of the MapReduce job,
with the standard partitioning and shuffling applied between the mapper and reducer.
You should consider Map1 as a preprocessing step and Map3 and Map4 as postprocess-
ing steps. The number of processing steps can vary. This is only an example.
You can specify the composition of this sequence of mappers and reducer with the
driver. See listing 5.1. You need to make sure the key and value outputs of one task
have matching types (classes) with the inputs of the next task.
Listing 5.1 Driver for chaining
mappers within a MapReduce job
Configuration conf = getConf();
JobConf job = new JobConf(conf);
job.setJobName("ChainJob");
job.setInputFormat(TextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, in);
FileOutputFormat.setOutputPath(job, out);
JobConf map1Conf = new JobConf(false);
ChainMapper.addMapper(job,
Map1.class,
LongWritable.class,
Text.class,
Add Map1 step to job
Text.class,
Text.class,
true,
map1Conf);
JobConf map2Conf = new JobConf(false);
ChainMapper.addMapper(job,
Map2.class,
Text.class,
Add Map2 step to job
Text.class,
LongWritable.class,
Text.class,
true,
map2Conf);
JobConf reduceConf = new JobConf(false);
ChainReducer.setReducer(job,
Reduce.class,
LongWritable.class,
Text.class,
Add Reduce step to job
Text.class,
Text.class,
true,
reduceConf);
JobConf map3Conf = new JobConf(false);
ChainReducer.addMapper(job,
Map3.class,
Add Map3 step to job
Text.class,
Text.class,
LongWritable.class,
 
 
Search WWH ::




Custom Search