Database Reference
In-Depth Information
Describing the Example 1 Code
In this section, we look at the Java code used in the Java-based simple Map Reduce word-count example. We then
proceed to compile it, create a jar file, and run it. This package is called org.myorg, and it is defined at line 1:
01 package org.myorg;
Lines 6 to 10 import Hadoop functionality for Path, configuration, I/O, Map Reduce, and utilities.
06 import org.apache.hadoop.fs.Path;
07 import org.apache.hadoop.conf.*;
08 import org.apache.hadoop.io.*;
09 import org.apache.hadoop.mapred.*;
10 import org.apache.hadoop.util.*;
For the details of these APIs, consult https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/ . Just
select the package name—that is, mapred —and then choose package-summary.html . The fs.Path class provides file
system path functionality and the Conf class adds configuration functionality as shown at line 51 in the example. The
IO class adds input/output functionality, while the Util class adds utilities like logging and checksums. The Mapred
class is the Hadoop V1 Map Reduce class API implimentation. The V1 class names are called mapred , while the V2
implementation that you will encounter later in the topic uses the term mapreduce .
The Map class is defined at line 15:
15 public static class Map extends MapReduceBase implements
16 Mapper<LongWritable, Text, Text, IntWritable>
Line 25 uses a StringTokenizer to break the input line into words, which are then passed as outputs as key-value
pairs—that is, <word,1>.
25 StringTokenizer tokenizer = new StringTokenizer(line);
26 while (tokenizer.hasMoreTokens())
27 {
28 word.set(tokenizer.nextToken());
29 output.collect(word, one);
30 }
As defined at line 34, the Reduce class accepts shuffled key-value pairs as input:
34 public static class Reduce extends MapReduceBase implements
35 Reducer<Text, IntWritable, Text, IntWritable>
From line 40 on, the code then totals the values for the key-value pairs with the same key and outputs the totaled
key-value pairs; that is, <word,5>:
40 int sum = 0;
41 while (values.hasNext())
42 {
43 sum += values.next().get();
44 }
45 output.collect(key, new IntWritable(sum));
 
Search WWH ::




Custom Search