Database Reference
In-Depth Information
1=973422173
2=1246032
4=10764500
5=158291879
6=40066
9=66136858
Notice that the counters for temperature have been made more readable by using a re-
source bundle named after the enum (using an underscore as a separator for nested
classes) — in this case MaxTemperatureWithCounters_Temperature.properties , which
contains the display name mappings.
Dynamic counters
The code makes use of a dynamic counter — one that isn't defined by a Java enum. Be-
cause a Java enum's fields are defined at compile time, you can't create new counters on
the fly using enums. Here we want to count the distribution of temperature quality codes,
and though the format specification defines the values that the temperature quality code
can take, it is more convenient to use a dynamic counter to emit the values that it actually
takes. The method we use on the Context object takes a group and counter name using
String names:
public Counter getCounter ( String groupName , String counterName )
The two ways of creating and accessing counters — using enums and using strings — are
actually equivalent because Hadoop turns enums into strings to send counters over RPC.
Enums are slightly easier to work with, provide type safety, and are suitable for most jobs.
For the odd occasion when you need to create counters dynamically, you can use the
String interface.
Retrieving counters
In addition to using the web UI and the command line (using mapred job -
counter ), you can retrieve counter values using the Java API. You can do this while the
job is running, although it is more usual to get counters at the end of a job run, when they
are stable. Example 9-2 shows a program that calculates the proportion of records that
have missing temperature fields.
Example 9-2. Application to calculate the proportion of records with missing temperature
fields
import org.apache.hadoop.conf.Configured ;
import org.apache.hadoop.mapreduce.* ;
import org.apache.hadoop.util.* ;
Search WWH ::




Custom Search