Databases Reference
In-Depth Information
IOException {
lineReader = new KeyValueLineRecordReader(job, split);
lineKey = lineReader.createKey();
lineValue = lineReader.createValue();
}
public boolean next(Text key, URLWritable value) throws IOException {
if (!lineReader.next(lineKey, lineValue)) {
return false;
}
key.set(lineKey);
value.set(lineValue.toString());
return true;
}
public Text createKey() {
return new Text("");
}
public URLWritable createValue() {
return new URLWritable();
}
public long getPos() throws IOException {
return lineReader.getPos();
}
public float getProgress() throws IOException {
return lineReader.getProgress();
}
public void close() throws IOException {
lineReader.close();
}
}
Our TimeUrlLineRecordReader class creates a KeyValueLineRecordReader object
and passes the getPos() , getProgress() , and close() method calls directly to it.
The next() method casts the lineValue Text object into the URLWritable type.
3.3.2
OutputFormat
MapReduce outputs data into files using the OutputFormat class, which is analogous
to the InputFormat class. The output has no splits, as each reducer writes its output
only to its own file. The output
files reside in a common directory and are typically
named part- nnnnn , where nnnnn is the partition ID of the reducer. RecordWriter
objects format the output and RecordReader s parse the format of the input.
Hadoop provides several standard implementations of OutputFormat, as shown
in table 3.5. Not surprisingly, almost all the ones we deal with inherit from the File
OutputFormat abstract class; InputFormat classes inherit from FileInputFormat .
You specify the OutputFormat by calling setOutputFormat() of the JobConf object
that holds the configuration of your MapReduce job.
 
Search WWH ::




Custom Search