Database Reference
In-Depth Information
Figure 10.1 illustrates the MapReduce processing for a single input—in this case, a
line of text.
Figure 10.1 Example of how MapReduce works
In this example, the map step parses the provided text string into individual words
and emits a set of key/value pairs of the form <word, 1> . For each unique key—in
this example, word —the reduce step sums the 1 values and outputs the <word,
count> key/value pairs. Because the word each appeared twice in the given line
of text, the reduce step provides a corresponding key/value pair of <each, 2> .
It should be noted that, in this example, the original key, 1234 , is ignored in the
processing. In a typical word count application, the map step may be applied to
millions of lines of text, and the reduce step will summarize the key/value pairs
generated by all the map steps.
Expanding on the word count example, the final output of a MapReduce process
applied to a set of documents might have the key as an ordered pair and the value
as an ordered tuple of length 2n. A possible representation of such a key/value pair
follows:
<(filename, datetime),(word1,5, word2,7,… , wordn,6)>
In this construction, the key is the ordered pair filename and datetime . The
value consists of the n pairs of the words and their individual counts in the
corresponding file.
Of course, a word count problem could be addressed in many ways other than
MapReduce. However, MapReduce has the advantage of being able to distribute
the workload over a cluster of computers and run the tasks in parallel. In a word
count, the documents, or even pieces of the documents, could be processed
Search WWH ::




Custom Search