Components of Hadoop - Hadoop in Action

Databases Reference

In-Depth Information

IOException {

lineReader = new KeyValueLineRecordReader(job, split);

lineKey = lineReader.createKey();

lineValue = lineReader.createValue();

}

public boolean next(Text key, URLWritable value) throws IOException {

if (!lineReader.next(lineKey, lineValue)) {

return false;

}

key.set(lineKey);

value.set(lineValue.toString());

return true;

}

public Text createKey() {

return new Text("");

}

public URLWritable createValue() {

return new URLWritable();

}

public long getPos() throws IOException {

return lineReader.getPos();

}

public float getProgress() throws IOException {

return lineReader.getProgress();

}

public void close() throws IOException {

lineReader.close();

}

➥

Our TimeUrlLineRecordReader class creates a KeyValueLineRecordReader object

and passes the getPos() , getProgress() , and close() method calls directly to it.

The next() method casts the lineValue Text object into the URLWritable type.

3.3.2

OutputFormat

MapReduce outputs data into files using the OutputFormat class, which is analogous

to the InputFormat class. The output has no splits, as each reducer writes its output

only to its own file. The output

files reside in a common directory and are typically

named part- nnnnn , where nnnnn is the partition ID of the reducer. RecordWriter

objects format the output and RecordReader s parse the format of the input.

Hadoop provides several standard implementations of OutputFormat, as shown

in table 3.5. Not surprisingly, almost all the ones we deal with inherit from the File

OutputFormat abstract class; InputFormat classes inherit from FileInputFormat .

You specify the OutputFormat by calling setOutputFormat() of the JobConf object

that holds the configuration of your MapReduce job.

Hadoop in Action

Search WWH ::

Custom Search

Home