MapReduce - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

(1950, 0)

(1950, 22)

(1950, −11)

(1949, 111)

(1949, 78)

The output from the map function is processed by the MapReduce framework before be-

ing sent to the reduce function. This processing sorts and groups the key-value pairs by

key. So, continuing the example, our reduce function sees the following input:

(1949, [111, 78])

(1950, [0, 22, −11])

Each year appears with a list of all its air temperature readings. All the reduce function

has to do now is iterate through the list and pick up the maximum reading:

(1949, 111)

(1950, 22)

This is the final output: the maximum global temperature recorded in each year.

The whole data flow is illustrated in Figure 2-1 . At the bottom of the diagram is a Unix

pipeline, which mimics the whole MapReduce flow and which we will see again later in

this chapter when we look at Hadoop Streaming.

Figure 2-1. MapReduce logical data flow

Java MapReduce

Having run through how the MapReduce program works, the next step is to express it in

code. We need three things: a map function, a reduce function, and some code to run the

job. The map function is represented by the Mapper class, which declares an abstract

map() method. Example 2-3 shows the implementation of our map function.

Example 2-3. Mapper for the maximum temperature example

import java.io.IOException ;

import org.apache.hadoop.io.IntWritable ;

Search WWH ::

Custom Search

Home