Database Reference
In-Depth Information
Understanding the map phase
In a MapReduce application, all the data read in the map function is read in the form of key
and value pairs. The processed output of the map function is also in the form of key and
value pairs. The processing of data as key and value pairs works well in a distributed com-
puting environment.
Let's understand how MapReduce works with the help of an example. The word counting
program is known as the Hello, World program for MapReduce. The program counts the
number of words in an input set of text files.
For this example, let's consider a file with the following line in it:
She sells sea shells on the sea shore where she also sells
cookies.
So, if the preceding text is provided as an input to the word count program, the expected
output would be as follows:
she, 2
sells,2
sea, 2
shells, 1
on, 1
the, 1
shore, 1
where, 1
also, 1
cookies, 1
The three major components of a MapReduce program are:
• Driver
• Mapper
• Reducer
The driver component of a MapReduce program is responsible for setting up the job con-
figurations and submitting it to the Hadoop cluster. This part of the program runs on the cli-
ent computer.
Search WWH ::




Custom Search