Database Reference
In-Depth Information
The driver component of the word count program would take two parameters to submit
the job to the Hadoop cluster:
• The location of the input files
• The location of the output file
Once the job is submitted to the cluster, the mapper reads every line in the file as <key,
value> pairs. So, if we consider a file with the line mentioned earlier, the key will be the
offset of the line and the value will be the entire sentence.
The mapper reads the line as follows:
<0000, She sells sea shells on the sea shore where she also
sells cookies>
Once read, the mapper logic would emit the <key, value> pairs for each word in the
sentence as follows:
<she, 1>
<sells, 1>
<sea, 1>
<shells, 1>
<on, 1>
<the, 1>
<sea, 1>
<shore, 1>
<where, 1>
<she, 1>
<also, 1>
<sells, 1>
<cookies, 1>
The mapping function has emitted each word in the sentence as a key and constant num-
ber 1 as the value for each key.
Search WWH ::




Custom Search