Database Reference
In-Depth Information
import org.apache.hadoop.io.LongWritable ;
import org.apache.hadoop.io.Text ;
import org.apache.hadoop.mapreduce.Mapper ;
public class MaxTemperatureMapper
extends Mapper < LongWritable , Text , Text , IntWritable > {
private static final int MISSING = 9999 ;
@Override
public void map ( LongWritable key , Text value , Context context )
throws IOException , InterruptedException {
String line = value . toString ();
String year = line . substring ( 15 , 19 );
int airTemperature ;
if ( line . charAt ( 87 ) == '+' ) { // parseInt doesn't like leading plus
signs
airTemperature = Integer . parseInt ( line . substring ( 88 , 92 ));
} else {
airTemperature = Integer . parseInt ( line . substring ( 87 , 92 ));
}
String quality = line . substring ( 92 , 93 );
if ( airTemperature != MISSING && quality . matches ( "[01459]" )) {
context . write ( new Text ( year ), new IntWritable ( airTemperature ));
}
}
}
The Mapper class is a generic type, with four formal type parameters that specify the in-
put key, input value, output key, and output value types of the map function. For the
present example, the input key is a long integer offset, the input value is a line of text, the
output key is a year, and the output value is an air temperature (an integer). Rather than
using built-in Java types, Hadoop provides its own set of basic types that are optimized
for network serialization. These are found in the org.apache.hadoop.io package.
Here we use LongWritable , which corresponds to a Java Long , Text (like Java
String ), and IntWritable (like Java Integer ).
The map() method is passed a key and a value. We convert the Text value containing
the line of input into a Java String , then use its substring() method to extract the
columns we are interested in.
The map() method also provides an instance of Context to write the output to. In this
case, we write the year as a Text object (since we are just using it as a key), and the tem-
Search WWH ::




Custom Search