Databases Reference
In-Depth Information
AD 9
AD 12
AD 7
AD 28
AD 14
AE 20
AE 7
AE 35
AE 11
AE 12
AE 24
AE 4
AE 16
AE 26
AE 11
AE 4
AE 23
AE 12
AE 16
AE 10
AG 18
AG 12
AG 8
AG 14
AG 24
AG 20
AG 7
AG 3
AI 10
AM 18
AN 5
AN 26
Under the Streaming API, the reducer will see these text data in STDIN . We have to code
our reducer to recover the key/value pairs by breaking each line at the tab character. Sort-
ing has “grouped” together records of the same key. As you read each line from STDIN,
you'll be responsible for keeping track of the boundary between records of different
keys. Note that although the keys are sorted, the values don't follow any particular order.
Finally, the reducer must perform its stated computation, which in this case is calculating
the average value across a key. Listing 4.8 gives the complete reducer in Python.
Listing 4.8 AverageByAttributeReducer.py
#!/usr/bin/env python
import sys
(last_key, sum, count) = (None, 0.0, 0)
for line in sys.stdin:
(key, val) = line.split("\t")
if last_key and last_key != key:
print last_key + "\t" + str(sum / count)
(sum, count) = (0.0, 0)
last_key = key
sum += float(val)
count += 1
print last_key + "\t" + str(sum / count)
The program keeps a running sum and count for each key. When it detects a new key
in the input stream or the end of the file, it computes the average for the previous key
and sends it to STDOUT . After running the entire MapReduce job, we can easily check
the correctness of the first few results.
AD 14.0
AE 15.4
AG 13.25
AI 10.0
AM 18.0
AN 9.625
 
Search WWH ::




Custom Search