Database Reference
In-Depth Information
-combiner ch02-mr-intro/src/main/ruby/max_temperature_reduce.rb \
-reducer ch02-mr-intro/src/main/ruby/max_temperature_reduce.rb
Note also the use of
-files
, which we use when running Streaming programs on the
cluster to ship the scripts to the cluster.
Python
Streaming supports any programming language that can read from standard input and
write to standard output, so for readers more familiar with Python, here's the same ex-
Example 2-9. Map function for maximum temperature in Python
#!/usr/bin/env python
import
re
import
sys
for
line
in
sys
.
stdin
:
val
=
line
.
strip
()
(
year
,
temp
,
q
) = (
val
[
15
:
19
],
val
[
87
:
92
],
val
[
92
:
93
])
if
(
temp
!=
"+9999"
and
re
.
match
(
"[01459]"
,
q
)):
print
"
%s
\t
%s
"
% (
year
,
temp
)
Example 2-10. Reduce function for maximum temperature in Python
#!/usr/bin/env python
import
sys
(
last_key
,
max_val
) = (
None
, -
sys
.
maxint
)
for
line
in
sys
.
stdin
:
(
key
,
val
) =
line
.
strip
().
split
(
"
\t
"
)
if
last_key
and
last_key
!=
key
:
print
"
%s
\t
%s
"
% (
last_key
,
max_val
)
(
last_key
,
max_val
) = (
key
,
int
(
val
))
else
:
(
last_key
,
max_val
) = (
key
,
max
(
max_val
,
int
(
val
)))
if
last_key
:
print
"
%s
\t
%s
"
% (
last_key
,
max_val
)
We can test the programs and run the job in the same way we did in Ruby. For example, to
run a test: