Database Reference
In-Depth Information
NOTE
It's worth drawing out a design difference between Streaming and the Java MapReduce API. The Java
API is geared toward processing your map function one record at a time. The framework calls the
map() method on your Mapper for each record in the input, whereas with Streaming the map program
can decide how to process the input — for example, it could easily read and process multiple lines at a
time since it's in control of the reading. The user's Java map implementation is “pushed” records, but it's
still possible to consider multiple lines at a time by accumulating previous lines in an instance variable in
the Mapper . [ 23 ] In this case, you need to implement the close() method so that you know when the
last record has been read, so you can finish processing the last group of lines.
Because the script just operates on standard input and output, it's trivial to test the script
without using Hadoop, simply by using Unix pipes:
% cat input/ncdc/sample.txt | ch02-mr-intro/src/main/ruby/
max_temperature_map.rb
1950 +0000
1950 +0022
1950 -0011
1949 +0111
1949 +0078
The reduce function shown in Example 2-8 is a little more complex.
Example 2-8. Reduce function for maximum temperature in Ruby
#!/usr/bin/env ruby
last_key , max_val = nil , - 1000000
STDIN . each_line do | line |
key , val = line . split ( " \t " )
if last_key && last_key != key
puts " #{ last_key } \t #{ max_val } "
last_key , max_val = key , val . to_i
else
last_key , max_val = key , [ max_val , val . to_i ]. max
end
end
puts " #{ last_key } \t #{ max_val } " if last_key
Again, the program iterates over lines from standard input, but this time we have to store
some state as we process each key group. In this case, the keys are the years, and we store
the last key seen and the maximum temperature seen so far for that key. The MapReduce
framework ensures that the keys are ordered, so we know that if a key is different from the
previous one, we have moved into a new key group. In contrast to the Java API, where
Search WWH ::




Custom Search