Database Reference
In-Depth Information
}
}
File information in the mapper
A mapper processing a file input split can find information about the split by calling the
getInputSplit() method on the Mapper 's Context object. When the input format
derives from FileInputFormat , the InputSplit returned by this method can be
cast to a FileSplit to access the file information listed in Table 8-7 .
In the old MapReduce API, and the Streaming interface, the same file split information is
made available through properties that can be read from the mapper's configuration. (In
the old MapReduce API this is achieved by implementing configure() in your Map-
per implementation to get access to the JobConf object.)
In addition to the properties in Table 8-7 , all mappers and reducers have access to the
properties listed in The Task Execution Environment .
Table 8-7. File split properties
FileSplit
method
Property name
Type
Description
Path / String The path of the input file being
processed
getPath()
mapreduce.map.input.file
The byte offset of the start of the
split from the beginning of the
file
getStart() mapreduce.map.input.start
long
The length of the split in bytes
getLength() mapreduce.map.input.length
long
In the next section, we'll see how to use a FileSplit when we need to access the
split's filename.
Processing a whole file as a record
A related requirement that sometimes crops up is for mappers to have access to the full
contents of a file. Not splitting the file gets you part of the way there, but you also need to
have a RecordReader that delivers the file contents as the value of the record. The list-
ing for WholeFileInputFormat in Example 8-2 shows a way of doing this.
Example 8-2. An InputFormat for reading a whole file as a record
Search WWH ::




Custom Search