Database Reference
In-Depth Information
MapReduce Library Classes
Hadoop comes with a library of mappers and reducers for commonly used functions. They
are listed with brief descriptions in
Table 9-8
. For further information on how to use them,
consult their Java documentation.
Table 9-8. MapReduce library classes
Classes
Description
ChainMapper
,
ChainReducer
Run a chain of mappers in a single mapper and a reducer
followed by a chain of mappers in a single reducer, re-
spectively. (Symbolically,
M+RM*
, where
M
is a mapper and
R
is a reducer.) This can substantially reduce the amount of
disk I/O incurred compared to running multiple MapRe-
duce jobs.
FieldSelectionMapReduce
(old API):
FieldSelectionMapper
and
FieldSelec-
tionReducer
(new API)
A mapper and reducer that can select fields (like the Unix
cut
command) from the input keys and values and emit
them as output keys and values.
IntSumReducer
,
LongSumReducer
Reducers that sum integer values to produce a total for
every key.
A mapper that swaps keys and values.
InverseMapper
MultithreadedMapRunner
(old API),
MultithreadedMapper
(new API)
A mapper (or map runner in the old API) that runs mappers
concurrently in separate threads. Useful for mappers that
are not CPU-bound.
A mapper that tokenizes the input value into words (using
Java's
StringTokenizer
) and emits each word along
with a count of 1.
TokenCounterMapper
A mapper that finds matches of a regular expression in the
input value and emits the matches along with a count of 1.
RegexMapper
[
61
]
One commonly used workaround for this problem — particularly in text-based Streaming applications —
is to add an offset to eliminate all negative numbers and to left pad with zeros so all numbers are the same
number of characters. However, see
Streaming
for another approach.
[
62
]
See
Sorting and merging SequenceFiles
for how to do the same thing using the sort program example that
comes with Hadoop.
[
63
]
A better answer is to use Pig (
Sorting Data
), Hive (
Sorting and Aggregating
)
, Crunch, or Spark, all of
which can sort with a single command.