Database Reference
In-Depth Information
context
)
throws
IOException
,
InterruptedException
;
}
The client running the job calculates the splits for the job by calling
getSplits()
, then
sends them to the application master, which uses their storage locations to schedule map
tasks that will process them on the cluster. The map task passes the split to the
cre-
ateRecordReader()
method on
InputFormat
to obtain a
RecordReader
for
that split. A
RecordReader
is little more than an iterator over records, and the map task
uses one to generate record key-value pairs, which it passes to the map function. We can
see this by looking at the
Mapper
's
run()
method:
public
void
run
(
Context context
)
throws
IOException
,
InterruptedException
{
setup
(
context
);
while
(
context
.
nextKeyValue
()) {
map
(
context
.
getCurrentKey
(),
context
.
getCurrentValue
(),
context
);
}
cleanup
(
context
);
}
After running
setup()
, the
nextKeyValue()
is called repeatedly on the
Context
(which delegates to the identically named method on the
RecordReader
) to populate
the key and value objects for the mapper. The key and value are retrieved from the
Re-
cordReader
by way of the
Context
and are passed to the
map()
method for it to do
its work. When the reader gets to the end of the stream, the
nextKeyValue()
method
returns
false
, and the map task runs its
cleanup()
method and then completes.
WARNING
Although it's not shown in the code snippet, for reasons of efficiency,
RecordReader
implementa-
tions will return the same key and value objects on each call to
getCurrentKey()
and
getCur-
rentValue()
. Only the contents of these objects are changed by the reader's
nextKeyValue()
method. This can be a surprise to users, who might expect keys and values to be immutable and not to be
reused. This causes problems when a reference to a key or value object is retained outside the
map()
method, as its value can change without warning. If you need to do this, make a copy of the object you
want to hold on to. For example, for a
Text
object, you can use its copy constructor:
new
Text(value)
.
The situation is similar with reducers. In this case, the value objects in the reducer's iterator are reused,
so you need to copy any that you need to retain between calls to the iterator (see
Example 9-11
).