MapReduce Types and Formats - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

MapReduce Types

The map and reduce functions in Hadoop MapReduce have the following general form:

map: (K1, V1) → list(K2, V2)

reduce: (K2, list(V2)) → list(K3, V3)

In general, the map input key and value types ( K1 and V1 ) are different from the map out-

put types ( K2 and V2 ). However, the reduce input must have the same types as the map

output, although the reduce output types may be different again ( K3 and V3 ). The Java API

mirrors this general form:

public class Mapper < KEYIN , VALUEIN , KEYOUT , VALUEOUT > {

public class Context extends MapContext < KEYIN , VALUEIN , KEYOUT ,

VALUEOUT > {

// ...

}

protected void map ( KEYIN key , VALUEIN value ,

Context context ) throws IOException , InterruptedException {

// ...

}

public class Reducer < KEYIN , VALUEIN , KEYOUT , VALUEOUT > {

public class Context extends ReducerContext < KEYIN , VALUEIN , KEYOUT ,

VALUEOUT > {

// ...

}

protected void reduce ( KEYIN key , Iterable < VALUEIN > values ,

Context context ) throws IOException , InterruptedException {

// ...

}

The context objects are used for emitting key-value pairs, and they are parameterized by

the output types so that the signature of the write() method is:

public void write ( KEYOUT key , VALUEOUT value )

throws IOException , InterruptedException

Since Mapper and Reducer are separate classes, the type parameters have different

scopes, and the actual type argument of KEYIN (say) in the Mapper may be different from

the type of the type parameter of the same name ( KEYIN ) in the Reducer . For instance,

in the maximum temperature example from earlier chapters, KEYIN is replaced by

LongWritable for the Mapper and by Text for the Reducer .

Search WWH ::

Custom Search

Home