Databases Reference
In-Depth Information
IOException {
lineReader = new KeyValueLineRecordReader(job, split);
lineKey = lineReader.createKey();
lineValue = lineReader.createValue();
}
public boolean next(Text key, URLWritable value) throws IOException {
if (!lineReader.next(lineKey, lineValue)) {
return false;
}
key.set(lineKey);
value.set(lineValue.toString());
return true;
}
public Text createKey() {
return new Text("");
}
public URLWritable createValue() {
return new URLWritable();
}
public long getPos() throws IOException {
return lineReader.getPos();
}
public float getProgress() throws IOException {
return lineReader.getProgress();
}
public void close() throws IOException {
lineReader.close();
}
}
➥
Our
TimeUrlLineRecordReader
class creates a
KeyValueLineRecordReader
object
and passes the
getPos()
,
getProgress()
, and
close()
method calls directly to it.
The
next()
method casts the
lineValue Text
object into the
URLWritable
type.
3.3.2
OutputFormat
MapReduce outputs data into files using the
OutputFormat
class, which is analogous
to the
InputFormat
class. The output has no splits, as each reducer writes its output
only to its own file. The output
files reside in a common directory and are typically
named part-
nnnnn
, where
nnnnn
is the partition ID of the reducer.
RecordWriter
objects format the output and
RecordReader
s parse the format of the input.
Hadoop provides several standard implementations of
OutputFormat,
as shown
in
table 3.5.
Not surprisingly, almost all the ones we deal with inherit from the
File
OutputFormat
abstract class;
InputFormat
classes inherit from
FileInputFormat
.
You specify the
OutputFormat
by calling
setOutputFormat()
of the
JobConf
object
that holds the configuration of your MapReduce job.