Database Reference
In-Depth Information
% cat input/ncdc/sample.txt | \
ch02-mr-intro/src/main/python/max_temperature_map.py | \
sort | ch02-mr-intro/src/main/python/max_temperature_reduce.py
1949 111
1950 22
[ 20 ] Functions with this property are called commutative and associative . They are also sometimes referred
to as distributive , such as by Jim Gray et al.'s “Data Cube: A Relational Aggregation Operator Generalizing
Group-By, Cross-Tab, and Sub-Totals,” February1995.
[ 21 ] This is a factor of seven faster than the serial run on one machine using awk . The main reason it wasn't
proportionately faster is because the input data wasn't evenly partitioned. For convenience, the input files
were gzipped by year, resulting in large files for later years in the dataset, when the number of weather re-
cords was much higher.
[ 22 ] Hadoop Pipes is an alternative to Streaming for C++ programmers. It uses sockets to communicate with
the process running the C++ map or reduce function.
[ 23 ] Alternatively, you could use “pull”-style processing in the new MapReduce API; see Appendix D .
[ 24 ] As an alternative to Streaming, Python programmers should consider Dumbo , which makes the Stream-
ing MapReduce interface more Pythonic and easier to use.
Search WWH ::




Custom Search