Database Reference
In-Depth Information
Appendix D. The Old and New Java
MapReduce APIs
The Java MapReduce API used throughout this topic is called the “new API,” and it re-
places the older, functionally equivalent API. Although Hadoop ships with both the old and
new MapReduce APIs, they are not compatible with each other. Should you wish to use the
old API, you can, since the code for all the MapReduce examples in this topic is available
for the old API on the topic's website (in the oldapi package).
There are several notable differences between the two APIs:
▪ The new API is in the org.apache.hadoop.mapreduce package (and sub-
packages). The old API can still be found in org.apache.hadoop.mapred .
▪ The new API favors abstract classes over interfaces, since these are easier to
evolve. This means that you can add a method (with a default implementation) to
an abstract class without breaking old implementations of the class. [ 168 ] For ex-
ample, the Mapper and Reducer interfaces in the old API are abstract classes in
the new API.
▪ The new API makes extensive use of context objects that allow the user code to
communicate with the MapReduce system. The new Context , for example, es-
sentially unifies the role of the JobConf , the OutputCollector , and the Re-
porter from the old API.
▪ In both APIs, key-value record pairs are pushed to the mapper and reducer, but in
addition, the new API allows both mappers and reducers to control the execution
flow by overriding the run() method. For example, records can be processed in
batches, or the execution can be terminated before all the records have been pro-
cessed. In the old API, this is possible for mappers by writing a MapRunnable ,
but no equivalent exists for reducers.
▪ Job control is performed through the Job class in the new API, rather than the old
JobClient , which no longer exists in the new API.
▪ Configuration has been unified in the new API. The old API has a special
JobConf object for job configuration, which is an extension of Hadoop's vanilla
Configuration object (used for configuring daemons; see The Configuration
API ). In the new API, job configuration is done through a Configuration , pos-
sibly via some of the helper methods on Job .
Search WWH ::




Custom Search