Database Reference
In-Depth Information
The key used for the sort is the key emitted during the Map phase—so each
key used is sorted to the same place.
Although Shuffle serves to only move data around and is actually a null
operation from a computation standpoint, Shuffle is often the slowest part
of the MapReduce operation. If you think about what has to happen, you
have a lot of data—terabytes, potentially—and you need to compute a global
ordering. In practice, Shuffle needs to only hash partition the data. That is,
you need to make sure only that all data with the same key ends up in the
same place, rather than requiring that keys actually get sorted. However,
Hadoop does a complete merge sort to make it easier for Reducers that rely
on ordering.
Reduce Phase
The Reduce phase is what enables MapReduce to aggregate data across the
entire data set. It is based on the functional programming Reduce operation
which, like Map, applies a function to each element in a collection and
returns a collection of results. The Reducer function is a bit different,
however—it takes two arguments—a key and a collection of inputs.
Remember that the Mapper function returned key-value pairs and Shuffle
sorted those pairs by key. Reduce calls the Reducer function once with
each unique key returned by a Mapper along with all the values that were
produced with that key.
To see how this can work, think about the previous word count example. The
Mapper returns pairs of words with the number of times they appear in an
input line. Shuffle then sorts these results by key. The Reduce phase calls the
Reducer with each word as a key and a list of the counts from each line. The
reducer can then just compute the sum of this list and emit the word and the
total count to compute the word count.
MapReduce Example
The canonical MapReduce example is counting word frequencies in a data
set. Consider a MapReduce operation that will count the word frequencies
in the following three lines of text:
1: "Tomorrow, and tomorrow, and tomorrow"
2: "Creeps in this petty pace from day to day"
3: "To the last syllable of recorded time"
Search WWH ::




Custom Search