Databases Reference
In-Depth Information
get the content one line at a time. The key returned by TextInputFormat is the byte
offset of each line, and we have yet to see any program that uses that key for its data
processing.
POPULAR INPUTFORMAT CLASSES
Table 3.4 lists other popular implementations of InputFormat along with a descrip-
tion of the key/value pair each one passes to the mapper.
Table 3.4 Main InputFormat classes. TextInputFormat is the default unless an alternative is
specified. The object type for key and value are also described.
InputFormat
Description
TextInputFormat
Each line in the text files is a record. Key is the byte
offset of the line, and value is the content of the line.
key: LongWritable
value: Text
KeyValueTextInputFormat
Each line in the text files is a record. The first separator
character divides each line. Everything before the
separator is the key, and everything after is the value.
The separator is set by the key.value.separator.in.input.
line property, and the default is the tab (\t) character.
key: Text
value: Text
An InputFormat for reading in sequence files . Key and
value are user defined. Sequence file is a Hadoop-
specific compressed binary file format. It's optimized for
passing data between the output of one MapReduce job
to the input of some other MapReduce job.
key: K (user defined)
value: V (user defined)
SequenceFileInputFormat<K,V>
NLineInputFormat
Same as TextInputFormat, but each split is guaranteed
to have exactly N lines. The mapred.line.input.format.
linespermap property, which defaults to one, sets N .
key: LongWritable
value: Text
KeyValueTextInputFormat is used in the more structured input files where a pre-
defined character, usually a tab (\t), separates the key and value of each line (record).
For example, you may have a tab-separated data file of timestamps and URLs:
17:16:18 http://hadoop.apache.org/core/docs/r0.19.0/api/index.html
17:16:19 http://hadoop.apache.org/core/docs/r0.19.0/mapred_tutorial.html
17:16:20 http://wiki.apache.org/hadoop/GettingStartedWithHadoop
17:16:20 http://www.maxim.com/hotties/2008/finalist_gallery.aspx
17:16:25 http://wiki.apache.org/hadoop/
...
 
Search WWH ::




Custom Search