Database Reference
In-Depth Information
MAPREDUCE SIGNATURES IN THE OLD API
In the old API (see Appendix D ), the signatures are very similar and actually name the type parameters
K1 , V1 , and so on, although the constraints on the types are exactly the same in both the old and new
APIs:
public interface Mapper < K1 , V1 , K2 , V2 > extends JobConfigurable , Closeable
{
void map ( K1 key , V1 value ,
OutputCollector < K2 , V2 > output , Reporter reporter ) throws
IOException ;
}
public interface Reducer < K2 , V2 , K3 , V3 > extends JobConfigurable ,
Closeable {
void reduce ( K2 key , Iterator < V2 > values ,
OutputCollector < K3 , V3 > output , Reporter reporter ) throws
IOException ;
}
public interface Partitioner < K2 , V2 > extends JobConfigurable {
int getPartition ( K2 key , V2 value , int numPartitions );
}
So much for the theory. How does this help you configure MapReduce jobs? Table 8-1
summarizes the configuration options for the new API (and Table 8-2 does the same for
the old API). It is divided into the properties that determine the types and those that have
to be compatible with the configured types.
Input types are set by the input format. So, for instance, a TextInputFormat generates
keys of type LongWritable and values of type Text . The other types are set explicitly
by calling the methods on the Job (or JobConf in the old API). If not set explicitly, the
intermediate types default to the (final) output types, which default to LongWritable
and Text . So, if K2 and K3 are the same, you don't need to call setMapOut-
putKeyClass() , because it falls back to the type set by calling setOut-
putKeyClass() . Similarly, if V2 and V3 are the same, you only need to use setOut-
putValueClass() .
It may seem strange that these methods for setting the intermediate and final output types
exist at all. After all, why can't the types be determined from a combination of the mapper
and the reducer? The answer has to do with a limitation in Java generics: type erasure
means that the type information isn't always present at runtime, so Hadoop has to be giv-
en it explicitly. This also means that it's possible to configure a MapReduce job with in-
compatible types, because the configuration isn't checked at compile time. The settings
Search WWH ::




Custom Search