MapReduce Types and Formats - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Similarly, even though the map output types and the reduce input types must match, this is

not enforced by the Java compiler.

The type parameters are named differently from the abstract types ( KEYIN versus K1 , and

so on), but the form is the same.

If a combiner function is used, then it has the same form as the reduce function (and is an

implementation of Reducer ), except its output types are the intermediate key and value

types ( K2 and V2 ), so they can feed the reduce function:

map: (K1, V1) → list(K2, V2)

combiner: (K2, list(V2)) → list(K2, V2)

reduce: (K2, list(V2)) → list(K3, V3)

Often the combiner and reduce functions are the same, in which case K3 is the same as

K2 , and V3 is the same as V2 .

The partition function operates on the intermediate key and value types ( K2 and V2 ) and

returns the partition index. In practice, the partition is determined solely by the key (the

value is ignored):

partition: (K2, V2) → integer

Or in Java:

public abstract class Partitioner < KEY , VALUE > {

public abstract int getPartition ( KEY key , VALUE value , int

numPartitions );

}

Search WWH ::

Custom Search

Home