MapReduce Features - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Table 9-2. Built-in MapReduce task counters

Counter

Description

Map input records

( MAP_INPUT_RECORDS )

The number of input records consumed by all the maps in the

job. Incremented every time a record is read from a Re-

cordReader and passed to the map's map() method by the

framework.

Split raw bytes ( SPLIT_RAW_BYTES )

The number of bytes of input-split objects read by maps. These

objects represent the split metadata (that is, the offset and length

within a file) rather than the split data itself, so the total size

should be small.

Map output records

( MAP_OUTPUT_RECORDS )

The number of map output records produced by all the maps in

the job. Incremented every time the collect() method is

called on a map's OutputCollector .

Map output bytes

( MAP_OUTPUT_BYTES )

The number of bytes of uncompressed output produced by all

the maps in the job. Incremented every time the collect()

method is called on a map's OutputCollector .

Map output materialized bytes

( MAP_OUTPUT_MATERIALIZED_BYTES )

The number of bytes of map output actually written to disk. If

map output compression is enabled, this is reflected in the

counter value.

Combine input records

( COMBINE_INPUT_RECORDS )

The number of input records consumed by all the combiners (if

any) in the job. Incremented every time a value is read from the

combiner's iterator over values. Note that this count is the num-

ber of values consumed by the combiner, not the number of dis-

tinct key groups (which would not be a useful metric, since

there is not necessarily one group per key for a combiner; see

Combiner Functions , and also Shuffle and Sort ).

Combine output records

( COMBINE_OUTPUT_RECORDS )

The number of output records produced by all the combiners (if

any) in the job. Incremented every time the collect() method

is called on a combiner's OutputCollector .

Reduce input groups

( REDUCE_INPUT_GROUPS )

The number of distinct key groups consumed by all the reducers

in the job. Incremented every time the reducer's reduce()

method is called by the framework.

Reduce input records

( REDUCE_INPUT_RECORDS )

The number of input records consumed by all the reducers in

the job. Incremented every time a value is read from the redu-

cer's iterator over values. If reducers consume all of their in-

puts, this count should be the same as the count for map output

records.

Hadoop: The Definitive Guide

Search WWH ::

Custom Search

Home