Database Reference
In-Depth Information
public class
WholeFileInputFormat
extends
FileInputFormat
<
NullWritable
,
BytesWritable
> {
@Override
protected
boolean
isSplitable
(
JobContext context
,
Path file
) {
return false
;
}
@Override
public
RecordReader
<
NullWritable
,
BytesWritable
>
createRecordReader
(
InputSplit split
,
TaskAttemptContext context
)
throws
IOException
,
InterruptedException
{
WholeFileRecordReader reader
=
new
WholeFileRecordReader
();
reader
.
initialize
(
split
,
context
);
return
reader
;
}
}
WholeFileInputFormat
defines a format where the keys are not used, represented
by
NullWritable
, and the values are the file contents, represented by
BytesWrit-
able
instances. It defines two methods. First, the format is careful to specify that input
files should never be split, by overriding
isSplitable()
to return
false
. Second,
we implement
createRecordReader()
to return a custom implementation of
Re-
cordReader
, which appears in
Example 8-3
.
Example 8-3. The RecordReader used by WholeFileInputFormat for reading a whole file
as a record
class
WholeFileRecordReader
extends
RecordReader
<
NullWritable
,
BytesWritable
> {
private
FileSplit fileSplit
;
private
Configuration conf
;
private
BytesWritable value
=
new
BytesWritable
();
private
boolean
processed
=
false
;
@Override
public
void
initialize
(
InputSplit split
,
TaskAttemptContext context
)
throws
IOException
,
InterruptedException
{
this
.
fileSplit
= (
FileSplit
)
split
;
this
.
conf
=
context
.
getConfiguration
();
}
@Override
public
boolean
nextKeyValue
()
throws
IOException
,