Database Reference
In-Depth Information
}
}
File information in the mapper
A mapper processing a file input split can find information about the split by calling the
getInputSplit()
method on the
Mapper
's
Context
object. When the input format
derives from
FileInputFormat
, the
InputSplit
returned by this method can be
In the old MapReduce API, and the Streaming interface, the same file split information is
made available through properties that can be read from the mapper's configuration. (In
the old MapReduce API this is achieved by implementing
configure()
in your
Map-
per
implementation to get access to the
JobConf
object.)
In addition to the properties in
Table 8-7
, all mappers and reducers have access to the
properties listed in
The Task Execution Environment
.
Table 8-7. File split properties
FileSplit
method
Property name
Type
Description
Path
/
String
The path of the input file being
processed
getPath()
mapreduce.map.input.file
The byte offset of the start of the
split from the beginning of the
file
getStart() mapreduce.map.input.start
long
The length of the split in bytes
getLength() mapreduce.map.input.length
long
In the next section, we'll see how to use a
FileSplit
when we need to access the
split's filename.
Processing a whole file as a record
A related requirement that sometimes crops up is for mappers to have access to the full
contents of a file. Not splitting the file gets you part of the way there, but you also need to
have a
RecordReader
that delivers the file contents as the value of the record. The list-
Example 8-2. An InputFormat for reading a whole file as a record