Database Reference
In-Depth Information
context )
throws IOException , InterruptedException ;
}
The client running the job calculates the splits for the job by calling getSplits() , then
sends them to the application master, which uses their storage locations to schedule map
tasks that will process them on the cluster. The map task passes the split to the cre-
ateRecordReader() method on InputFormat to obtain a RecordReader for
that split. A RecordReader is little more than an iterator over records, and the map task
uses one to generate record key-value pairs, which it passes to the map function. We can
see this by looking at the Mapper 's run() method:
public void run ( Context context ) throws IOException ,
InterruptedException {
setup ( context );
while ( context . nextKeyValue ()) {
map ( context . getCurrentKey (), context . getCurrentValue (), context );
}
cleanup ( context );
}
After running setup() , the nextKeyValue() is called repeatedly on the Context
(which delegates to the identically named method on the RecordReader ) to populate
the key and value objects for the mapper. The key and value are retrieved from the Re-
cordReader by way of the Context and are passed to the map() method for it to do
its work. When the reader gets to the end of the stream, the nextKeyValue() method
returns false , and the map task runs its cleanup() method and then completes.
WARNING
Although it's not shown in the code snippet, for reasons of efficiency, RecordReader implementa-
tions will return the same key and value objects on each call to getCurrentKey() and getCur-
rentValue() . Only the contents of these objects are changed by the reader's nextKeyValue()
method. This can be a surprise to users, who might expect keys and values to be immutable and not to be
reused. This causes problems when a reference to a key or value object is retained outside the map()
method, as its value can change without warning. If you need to do this, make a copy of the object you
want to hold on to. For example, for a Text object, you can use its copy constructor: new
Text(value) .
The situation is similar with reducers. In this case, the value objects in the reducer's iterator are reused,
so you need to copy any that you need to retain between calls to the iterator (see Example 9-11 ).
Search WWH ::




Custom Search