Writing basic MapReduce programs - Hadoop in Action

Databases Reference

In-Depth Information

1000051 4541310

1000054 4946631

1000065 4748968

1000067 5312208,4944640,5071294

1000070 4928425,5009029

We have discovered that patents 5312208, 4944640, and 5071294 cited patent

1000067. For this section we won't focus too much on the MapReduce data flow,

which we've already covered in chapter 3. Instead we focus on the structure of a

MapReduce program. We need only one file for the entire program as you can see

in listing 4.1.

Listing 4.1 Template for a typical Hadoop program

public class MyJob extends Configured implements Tool {

public static class MapClass extends MapReduceBase

implements Mapper<Text, Text, Text, Text> {

public void map(Text key, Text value,

OutputCollector<Text, Text> output,

Reporter reporter) throws IOException {

output.collect(value, key);

}

public static class Reduce extends MapReduceBase

implements Reducer<Text, Text, Text, Text> {

public void reduce(Text key, Iterator<Text> values,

OutputCollector<Text, Text> output,

Reporter reporter) throws IOException {

String csv = "";

while (values.hasNext()) {

if (csv.length() > 0) csv += ",";

csv += values.next().toString();

}

output.collect(key, new Text(csv));

}

public int run(String[] args) throws Exception {

Configuration conf = getConf();

JobConf job = new JobConf(conf, MyJob.class);

Path in = new Path(args[0]);

Path out = new Path(args[1]);

FileInputFormat.setInputPaths(job, in);

FileOutputFormat.setOutputPath(job, out);

job.setJobName("MyJob");

job.setMapperClass(MapClass.class);

job.setReducerClass(Reduce.class);

job.setInputFormat(KeyValueTextInputFormat.class);

Search WWH ::

Custom Search

Home