Database Reference
In-Depth Information
Describing the Example 1 Code
In this section, we look at the Java code used in the Java-based simple Map Reduce word-count example. We then
proceed to compile it, create a jar file, and run it. This package is called org.myorg, and it is defined at line 1:
01 package org.myorg;
Lines 6 to 10 import Hadoop functionality for Path, configuration, I/O, Map Reduce, and utilities.
06 import org.apache.hadoop.fs.Path;
07 import org.apache.hadoop.conf.*;
08 import org.apache.hadoop.io.*;
09 import org.apache.hadoop.mapred.*;
10 import org.apache.hadoop.util.*;
For the details of these APIs, consult
https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/
.
Just
select the package name—that is,
mapred
—and then choose
package-summary.html
. The
fs.Path class
provides file
system path functionality and the
Conf
class adds configuration functionality as shown at line 51 in the example. The
IO
class adds input/output functionality, while the
Util
class adds utilities like logging and checksums. The
Mapred
class is the Hadoop V1 Map Reduce class API implimentation. The V1 class names are called
mapred
, while the V2
implementation that you will encounter later in the topic uses the term
mapreduce
.
The
Map
class is defined at line 15:
15 public static class Map extends MapReduceBase implements
16 Mapper<LongWritable, Text, Text, IntWritable>
Line 25 uses a
StringTokenizer
to break the input line into words, which are then passed as outputs as key-value
pairs—that is, <word,1>.
25 StringTokenizer tokenizer = new StringTokenizer(line);
26 while (tokenizer.hasMoreTokens())
27 {
28 word.set(tokenizer.nextToken());
29 output.collect(word, one);
30 }
As defined at line 34, the
Reduce
class accepts shuffled key-value pairs as input:
34 public static class Reduce extends MapReduceBase implements
35 Reducer<Text, IntWritable, Text, IntWritable>
From line 40 on, the code then totals the values for the key-value pairs with the same key and outputs the totaled
key-value pairs; that is, <word,5>:
40 int sum = 0;
41 while (values.hasNext())
42 {
43 sum += values.next().get();
44 }
45 output.collect(key, new IntWritable(sum));
Search WWH ::
Custom Search