MapReduce - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Map-Reduce Framework

Map input records=5

Map output records=5

Map output bytes=45

Map output materialized bytes=61

Input split bytes=129

Combine input records=0

Combine output records=0

Reduce input groups=2

Reduce shuffle bytes=61

Reduce input records=5

Reduce output records=2

Spilled Records=10

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms)=39

Total committed heap usage (bytes)=226754560

File Input Format Counters

Bytes Read=529

File Output Format Counters

Bytes Written=29

When the hadoop command is invoked with a classname as the first argument, it

launches a Java virtual machine (JVM) to run the class. The hadoop command adds the

Hadoop libraries (and their dependencies) to the classpath and picks up the Hadoop con-

figuration, too. To add the application classes to the classpath, we've defined an environ-

ment variable called HADOOP_CLASSPATH , which the hadoop script picks up.

NOTE

When running in local (standalone) mode, the programs in this topic all assume that you have set the

HADOOP_CLASSPATH in this way. The commands should be run from the directory that the example

code is installed in.

The output from running the job provides some useful information. For example, we can

see that the job was given an ID of job_local26392882_0001 , and it ran one map

task and one reduce task (with the following IDs: at-

tempt_local26392882_0001_m_000000_0 and at-

tempt_local26392882_0001_r_000000_0 ). Knowing the job and task IDs can

be very useful when debugging MapReduce jobs.

Search WWH ::

Custom Search

Home