Database Reference
In-Depth Information
MapReduce Types
The map and reduce functions in Hadoop MapReduce have the following general form:
map: (K1, V1) → list(K2, V2)
reduce: (K2, list(V2)) → list(K3, V3)
In general, the map input key and value types ( K1 and V1 ) are different from the map out-
put types ( K2 and V2 ). However, the reduce input must have the same types as the map
output, although the reduce output types may be different again ( K3 and V3 ). The Java API
mirrors this general form:
public class Mapper < KEYIN , VALUEIN , KEYOUT , VALUEOUT > {
public class Context extends MapContext < KEYIN , VALUEIN , KEYOUT ,
VALUEOUT > {
// ...
}
protected void map ( KEYIN key , VALUEIN value ,
Context context ) throws IOException , InterruptedException {
// ...
}
}
public class Reducer < KEYIN , VALUEIN , KEYOUT , VALUEOUT > {
public class Context extends ReducerContext < KEYIN , VALUEIN , KEYOUT ,
VALUEOUT > {
// ...
}
protected void reduce ( KEYIN key , Iterable < VALUEIN > values ,
Context context ) throws IOException , InterruptedException {
// ...
}
}
The context objects are used for emitting key-value pairs, and they are parameterized by
the output types so that the signature of the write() method is:
public void write ( KEYOUT key , VALUEOUT value )
throws IOException , InterruptedException
Since Mapper and Reducer are separate classes, the type parameters have different
scopes, and the actual type argument of KEYIN (say) in the Mapper may be different from
the type of the type parameter of the same name ( KEYIN ) in the Reducer . For instance,
in the maximum temperature example from earlier chapters, KEYIN is replaced by
LongWritable for the Mapper and by Text for the Reducer .
Search WWH ::




Custom Search