Database Reference
In-Depth Information
The Mapper function gets called once with each of these lines. It doesn't
matter in which order they get called, or even whether the three calls are
performed on the same machine. The output of the three Mapper calls look
like:
1: [{tomorrow, 3}, {and, 2}]
2: [{creeps, 1}, {in, 1}, {this, 1}, {petty, 1},
{pace, 1},
{from, 1}, {day, 2}, {to, 1}]
3: [{to, 1}, {the, 1}, {last, 1}, {syllable, 1}, {of,
1},
{recorded, 1}, {time, 1}]
Next, the Shuffle phase goes to work on the Mapper's output and produces
the following:
{and, [2]}
{creeps, [1]}
{day, [2]}
{from, [1]}
{in, [1]}
{last, [1]}
{of, [1]}
{pace, [1]}
{petty, [1]}
{recorded, [1]}
{syllable, [1]}
{the, [1]}
{this, [1]}
{time, [1]}
{to, [1, 1]}
{tomorrow, [3]}
The shuffler output is mostly uninteresting except for
to
, which is the only
word to appear on more than one line. The
to
entry contains a list of two
elements, one for each time it appeared in the Mapper's output.
Finally, this data is passed to the Reducer, which takes each word and sums
up the totals and produces a count for each word. The results follow: