Database Reference
In-Depth Information
MapReduce Types
The map and reduce functions in Hadoop MapReduce have the following general form:
map: (K1, V1) → list(K2, V2)
reduce: (K2, list(V2)) → list(K3, V3)
In general, the map input key and value types (
K1
and
V1
) are different from the map out-
put types (
K2
and
V2
). However, the reduce input must have the same types as the map
output, although the reduce output types may be different again (
K3
and
V3
). The Java API
mirrors this general form:
public class
Mapper
<
KEYIN
,
VALUEIN
,
KEYOUT
,
VALUEOUT
> {
public class
Context
extends
MapContext
<
KEYIN
,
VALUEIN
,
KEYOUT
,
VALUEOUT
> {
// ...
}
protected
void
map
(
KEYIN key
,
VALUEIN value
,
Context context
)
throws
IOException
,
InterruptedException
{
// ...
}
}
public class
Reducer
<
KEYIN
,
VALUEIN
,
KEYOUT
,
VALUEOUT
> {
public class
Context
extends
ReducerContext
<
KEYIN
,
VALUEIN
,
KEYOUT
,
VALUEOUT
> {
// ...
}
protected
void
reduce
(
KEYIN key
,
Iterable
<
VALUEIN
>
values
,
Context context
)
throws
IOException
,
InterruptedException
{
// ...
}
}
The context objects are used for emitting key-value pairs, and they are parameterized by
the output types so that the signature of the
write()
method is:
public
void
write
(
KEYOUT key
,
VALUEOUT value
)
throws
IOException
,
InterruptedException
Since
Mapper
and
Reducer
are separate classes, the type parameters have different
scopes, and the actual type argument of
KEYIN
(say) in the
Mapper
may be different from
the type of the type parameter of the same name (
KEYIN
) in the
Reducer
. For instance,
in the maximum temperature example from earlier chapters,
KEYIN
is replaced by
LongWritable
for the
Mapper
and by
Text
for the
Reducer
.