Database Reference
In-Depth Information
The map method is defined at line 79:
79 public void map(LongWritable key, Text value, OutputCollector < Text,
80 IntWritable > output, Reporter reporter) throws IOException
As in the last example, a StringTokenizer (line 90) breaks the line into words, then a while loop outputs the
words as key-value pairs where the key is the word and the value is 1:
90 StringTokenizer tokenizer = new StringTokenizer(line);
91 while (tokenizer.hasMoreTokens())
92 {
93 word.set(tokenizer.nextToken());
94 output.collect(word, one);
95 reporter.incrCounter(Counters.INPUT_WORDS, 1);
96 }
The Reduce class is defined at line 108:
108 public static class Reduce extends MapReduceBase implements Reducer
109 < Text, IntWritable, Text, IntWritable >
The reduce method totals the values for similar words and outputs the key-value pair beginning at line 117:
117 while (values.hasNext())
118 {
119 sum += values.next().get();
120 }
121 output.collect(key, new IntWritable(sum));
There is now a run method (starting at line 125) that contains the functionality from example 1's main method. It
sets the Map Reduce and I/O format classes:
133 conf.setMapperClass(Map.class);
134 conf.setCombinerClass(Reduce.class);
135 conf.setReducerClass(Reduce.class);
136
137 conf.setInputFormat(TextInputFormat.class);
138 conf.setOutputFormat(TextOutputFormat.class);
The new run method parses the skip command line option, saves the pattern file name, and sets the skip
patterns option to True. Processing of the skip file can be seen at line 143 via the -skip command line option:
143 if ("-skip".equals(args[i]))
144 {
145 DistributedCache.addCacheFile(new Path(args[++i]).toUri(), conf);
146 conf.setBoolean("wordcount.skip.patterns", true);
147 }
 
Search WWH ::




Custom Search