Database Reference
In-Depth Information
The -mapper and -reducer arguments take a command or a Java class. A combiner
may optionally be specified using the -combiner argument.
Keys and values in Streaming
A Streaming application can control the separator that is used when a key-value pair is
turned into a series of bytes and sent to the map or reduce process over standard input.
The default is a tab character, but it is useful to be able to change it in the case that the
keys or values themselves contain tab characters.
Similarly, when the map or reduce writes out key-value pairs, they may be separated by a
configurable separator. Furthermore, the key from the output can be composed of more
than the first field: it can be made up of the first n fields (defined by
stream.num.map.output.key.fields or
stream.num.reduce.output.key.fields ), with the value being the remaining
fields. For example, if the output from a Streaming process was a,b,c (with a comma as
the separator), and n was 2, the key would be parsed as a,b and the value as c .
Separators may be configured independently for maps and reduces. The properties are lis-
ted in Table 8-3 and shown in a diagram of the data flow path in Figure 8-1 .
These settings do not have any bearing on the input and output formats. For example, if
stream.reduce.output.field.separator were set to be a colon, say, and the
reduce stream process wrote the line a:b to standard out, the Streaming reducer would
know to extract the key as a and the value as b . With the standard TextOut-
putFormat , this record would be written to the output file with a tab separating a and
b . You can change the separator that TextOutputFormat uses by setting mapre-
duce.output.textoutputformat.separator .
Search WWH ::




Custom Search