Databases Reference
In-Depth Information
MapReduce programming model
MapReduce is based on functional programming models largely from Lisp. Typically, the users will
implement two functions:
Map (in_key, in_value) -> (out_key, intermediate_value) list
The Map function written by the user will receive an input pair of keys and values, and after the
computation cycles, will produce a set of intermediate key-value pairs.
Library functions then are used to group together all intermediate values associated with an
intermediate key I and passes them to the Reduce function.
Reduce (out_key, intermediate_value list) -> out_value list
The Reduce function written by the user will accept an intermediate key I, and the set of
values for the key.
It will merge together these values to form a possibly smaller set of values.
Reducer outputs are just zero or one output value per invocation.
The intermediate values are supplied to the Reduce function via an iterator. The Iterator
function allows us to handle large lists of values that cannot be fit in memory or a single pass.
The MapReduce framework consists of a library of different interfaces. The major interfaces
used by developers are Mapper, Reducer, JobConf, JobClient, Partitioner, OutputCollector, Reporter,
InputFormat, OutputFormat, and OutputCommitter.
Figure 4.8 shows the overall architecture of MapReduce. The main components of this architec-
ture include:
Mapper—maps input key-value pairs to a set of intermediate key-value pairs. For an input pair
the mapper can map to zero or many output pairs. By default the mapper spawns one map task for
each input.
Reducer—performs a number of tasks:
Sort and group mapper outputs.
Shuffle partitions.
Partition
Map
Combine
Reducer
Map
Partition
Input
Output
Map
Partition
Combine
Reducer
Map
Partition
FIGURE 4.8
MapReduce architecture.
 
Search WWH ::




Custom Search