Database Reference
In-Depth Information
import
org.apache.hadoop.io.LongWritable
;
import
org.apache.hadoop.io.Text
;
import
org.apache.hadoop.mapreduce.Mapper
;
public class
MaxTemperatureMapper
extends
Mapper
<
LongWritable
,
Text
,
Text
,
IntWritable
> {
private static final
int
MISSING
=
9999
;
@Override
public
void
map
(
LongWritable key
,
Text value
,
Context context
)
throws
IOException
,
InterruptedException
{
String line
=
value
.
toString
();
String year
=
line
.
substring
(
15
,
19
);
int
airTemperature
;
if
(
line
.
charAt
(
87
) ==
'+'
) {
// parseInt doesn't like leading plus
signs
airTemperature
=
Integer
.
parseInt
(
line
.
substring
(
88
,
92
));
}
else
{
airTemperature
=
Integer
.
parseInt
(
line
.
substring
(
87
,
92
));
}
String quality
=
line
.
substring
(
92
,
93
);
if
(
airTemperature
!=
MISSING
&&
quality
.
matches
(
"[01459]"
)) {
context
.
write
(
new
Text
(
year
),
new
IntWritable
(
airTemperature
));
}
}
}
The
Mapper
class is a generic type, with four formal type parameters that specify the in-
put key, input value, output key, and output value types of the map function. For the
present example, the input key is a long integer offset, the input value is a line of text, the
output key is a year, and the output value is an air temperature (an integer). Rather than
using built-in Java types, Hadoop provides its own set of basic types that are optimized
for network serialization. These are found in the
org.apache.hadoop.io
package.
Here we use
LongWritable
, which corresponds to a Java
Long
,
Text
(like Java
String
), and
IntWritable
(like Java
Integer
).
The
map()
method is passed a key and a value. We convert the
Text
value containing
the line of input into a Java
String
, then use its
substring()
method to extract the
columns we are interested in.
The
map()
method also provides an instance of
Context
to write the output to. In this
case, we write the year as a
Text
object (since we are just using it as a key), and the tem-