MapReduce Types and Formats - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Table 8-5. Properties for controlling split size

Property name

Type Default value

Description

mapreduce.input.fileinputformat.split.minsize int 1

The smal-

lest valid

size in

bytes for a

file split

long Long.MAX_VALUE (i.e.,

9223372036854775807)

The largest

valid size

in bytes for

a file split

mapreduce.input.fileinputformat.split.maxsize

[ a ]

long 128 MB (i.e.,

134217728)

The size of

a block in

HDFS in

bytes

dfs.blocksize

[ a ] This property is not present in the old MapReduce API (with the exception of CombineFileInputFormat ). Instead,

it is calculated indirectly as the size of the total input for the job, divided by the guide number of map tasks specified

by mapreduce.job.maps (or the setNumMapTasks() method on JobConf ). Because the number of map tasks defaults

to 1, this makes the maximum split size the size of the input.

The minimum split size is usually 1 byte, although some formats have a lower bound on

the split size. (For example, sequence files insert sync entries every so often in the stream,

so the minimum split size has to be large enough to ensure that every split has a sync point

to allow the reader to resynchronize with a record boundary. See Reading a

SequenceFile . )

Applications may impose a minimum split size. By setting this to a value larger than the

block size, they can force splits to be larger than a block. There is no good reason for do-

ing this when using HDFS, because doing so will increase the number of blocks that are

not local to a map task.

The maximum split size defaults to the maximum value that can be represented by a Java

long type. It has an effect only when it is less than the block size, forcing splits to be

smaller than a block.

The split size is calculated by the following formula (see the computeSplitSize()

method in FileInputFormat ):

max ( minimumSize , min ( maximumSize , blockSize ))

and by default:

Hadoop: The Definitive Guide

Search WWH ::

Custom Search

Home