Database Reference
In-Depth Information
The input file on the left of Figure 4-1 is split into chunks of data. The size of these splits is controlled by the
InputSplit method within the FileInputFormat class of the Map Reduce job. The number of splits is influenced by the
HDFS block size, and one mapper job is created for each split data chunk.
Each split data chunk—that is, “one one two three,” is sent to a Mapper process on a Data Node server. In the
word-count example, the Map process creates a series of key-value pairs where the key is the word—for instance,
“one”—and the value is the count of 1. These key-value pairs are then shuffled into lists by key type. The shuffled lists
are input to Reduce tasks, which reduce the list volume by summing the values (in this simple example). The Reduce
output is then a simple list of summed key-value pairs.
The Map Reduce framework takes care of all other tasks, like scheduling and resources.
Map Reduce Native
Now, it's time to put the word-count algorithm to work, starting with the most basic example: the Map Reduce native
version. The term Map Reduce native means that the Map Reduce code is written in Java using the functionality
provided by the Hadoop core libraries within the Hadoop installation directory. Map Reduce native word-count
algorithms are available from the Apache Software Foundation Hadoop 1.2.1 Map Reduce tutorial on the Hadoop
website ( hadoop.apache.org/docs/r1.2.1/ ). For instance, I sourced two versions of the Hadoop word-count
algorithm and stored them in Java files in a word-count directory, as follows:
[hadoop@hc1nn wordcount]$ pwd
/usr/local/hadoop/wordcount
[hadoop@hc1nn wordcount]$ ls *.java
wc-ex1.java wc-ex2.java
The file wc-ex1.java contains the first simple example, while the second Java file contains a second, more
complex version.
Java Word-Count Example 1
Consider the Java code for the first word-count example. It follows the basic word-count steps shown in Figure 4-1 :
01 package org.myorg;
02
03 import java.io.IOException;
04 import java.util.*;
05
06 import org.apache.hadoop.fs.Path;
07 import org.apache.hadoop.conf.*;
08 import org.apache.hadoop.io.*;
09 import org.apache.hadoop.mapred.*;
10 import org.apache.hadoop.util.*;
11
12 public class WordCount
13 {
14
15 public static class Map extends MapReduceBase implements
16 Mapper<LongWritable, Text, Text, IntWritable>
17 {
18 private final static IntWritable one = new IntWritable(1);
19 private Text word = new Text();
20
 
Search WWH ::




Custom Search