Database Reference
In-Depth Information
MapReduce Library Classes
Hadoop comes with a library of mappers and reducers for commonly used functions. They
are listed with brief descriptions in Table 9-8 . For further information on how to use them,
consult their Java documentation.
Table 9-8. MapReduce library classes
Classes
Description
ChainMapper , ChainReducer
Run a chain of mappers in a single mapper and a reducer
followed by a chain of mappers in a single reducer, re-
spectively. (Symbolically, M+RM* , where M is a mapper and
R is a reducer.) This can substantially reduce the amount of
disk I/O incurred compared to running multiple MapRe-
duce jobs.
FieldSelectionMapReduce (old API):
FieldSelectionMapper and FieldSelec-
tionReducer (new API)
A mapper and reducer that can select fields (like the Unix
cut command) from the input keys and values and emit
them as output keys and values.
IntSumReducer , LongSumReducer
Reducers that sum integer values to produce a total for
every key.
A mapper that swaps keys and values.
InverseMapper
MultithreadedMapRunner (old API),
MultithreadedMapper (new API)
A mapper (or map runner in the old API) that runs mappers
concurrently in separate threads. Useful for mappers that
are not CPU-bound.
A mapper that tokenizes the input value into words (using
Java's StringTokenizer ) and emits each word along
with a count of 1.
TokenCounterMapper
A mapper that finds matches of a regular expression in the
input value and emits the matches along with a count of 1.
RegexMapper
[ 61 ] One commonly used workaround for this problem — particularly in text-based Streaming applications —
is to add an offset to eliminate all negative numbers and to left pad with zeros so all numbers are the same
number of characters. However, see Streaming for another approach.
[ 62 ] See Sorting and merging SequenceFiles for how to do the same thing using the sort program example that
comes with Hadoop.
[ 63 ] A better answer is to use Pig ( Sorting Data ), Hive ( Sorting and Aggregating ) , Crunch, or Spark, all of
which can sort with a single command.
Search WWH ::




Custom Search