Database Reference
In-Depth Information
▪ Output files are named slightly differently: in the old API both map and reduce
outputs are named
part-
nnnnn
, whereas in the new API map outputs are named
part-m-
nnnnn
and reduce outputs are named
part-r-
nnnnn
(where
nnnnn
is an
integer designating the part number, starting from 00000).
▪ User-overridable methods in the new API are declared to throw
java.lang.InterruptedException
. This means that you can write your
code to be responsive to interrupts so that the framework can gracefully cancel
▪ In the new API, the
reduce()
method passes values as a
java.lang.Iterable
, rather than a
java.lang.Iterator
(as the old
API does). This change makes it easier to iterate over the values using Java's
for
-
each
loop construct:
for
(
VALUEIN value
:
values
) { ... }
WARNING
Programs using the new API that were compiled against Hadoop 1 need to be recompiled to run against
Hadoop 2. This is because some classes in the new MapReduce API changed to interfaces between the
Hadoop 1 and Hadoop 2 releases. The symptom is an error at runtime like the following:
java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
ten to use the old API. The differences are highlighted in bold.
WARNING
When converting your
Mapper
and
Reducer
classes to the new API, don't forget to change the signa-
tures of the
map()
and
reduce()
methods to the new form. Just changing your class to extend the
new
Mapper
or
Reducer
classes will not produce a compilation error or warning, because these
classes provide identity forms of the
map()
and
reduce()
methods (respectively). Your mapper or re-
ducer code, however, will not be invoked, which can lead to some hard-to-diagnose errors.
Annotating your
map()
and
reduce()
methods with the
@Override
annotation will allow the Java
compiler to catch these errors.
Example D-1. Application to find the maximum temperature, using the old MapReduce API
public class OldMaxTemperature {