Database Reference
In-Depth Information
Hadoop Streaming
License
Apache License, Version 2.0
Activity
Medium
Purpose
Write MapReduce code without Java
Official Page
http://hadoop.apache.org/docs/r1.2.1/streaming.html
Hadoop Integration Fully Integrated
You have some data, you have an idea of what you want to do with it, you understand the
concepts of MapReduce, but you don't have solid Java or MapReduce expertise, and the
problem does not really fit into any of the other major tools that Hadoop has to offer. Your
solution may be Hadoop Streaming, which allows you to write code in any Linux program
that reads from stdin and writes to stdout.
You still need to write mappers and reducers, but in the language of your choice. Your map-
per will likely read lines from a text file and produce a key-value pair separated by a tab
character. The shuffle phase of the process will be handled by the MapReduce infrastructure,
and your reducer will read from standard input (stdin), do its processing, and write its output
to standard output (stdout).
The reference in the following “Tutorial Links” section shows a WordCount application in
Hadoop Streaming using Python.
Is Streaming going to be as performant as native Java code? Almost certainly not, but if your
organization has Ruby or Python or similar skills, you will definitely yield better results than
sending your developers off to learn Java before doing any MapReduce projects.
Search WWH ::




Custom Search