Database Reference
In-Depth Information
Table 8-4. Input path and filter properties
Property name
Type
Default
value
Description
mapreduce.input.fileinputformat.inputdir
Comma-sep-
arated paths
None The input files for a job.
Paths that contain commas
should have those commas
escaped by a backslash
character. For example,
the glob
{a,b}
would be
escaped as
{a\,b}
.
None The filter to apply to the
input files for a job.
mapreduce.input.pathFilter.class
PathFilter
classname
FileInputFormat input splits
Given a set of files, how does
FileInputFormat
turn them into splits?
FileIn-
putFormat
splits only large files — here, “large” means larger than an HDFS block.
The split size is normally the size of an HDFS block, which is appropriate for most applic-
ations; however, it is possible to control this value by setting various Hadoop properties,
as shown in
Table 8-5
.