Database Reference
In-Depth Information
Avro MapReduce
Avro provides a number of classes for making it easy to run MapReduce programs on Avro
data. We'll use the new MapReduce API classes from the
org.apache.avro.mapreduce package, but you can find (old-style) MapReduce
classes in the org.apache.avro.mapred package.
Let's rework the MapReduce program for finding the maximum temperature for each year
in the weather dataset, this time using the Avro MapReduce API. We will represent weather
records using the following schema:
{
"type" : "record" ,
"name" : "WeatherRecord" ,
"doc" : "A weather reading." ,
"fields" : [
{ "name" : "year" , "type" : "int" },
{ "name" : "temperature" , "type" : "int" },
{ "name" : "stationId" , "type" : "string" }
]
}
The program in Example 12-2 reads text input (in the format we saw in earlier chapters)
and writes Avro datafiles containing weather records as output.
Example 12-2. MapReduce program to find the maximum temperature, creating Avro output
public class AvroGenericMaxTemperature extends Configured implements
Tool {
private static final Schema SCHEMA = new Schema . Parser (). parse (
"{" +
" \"type\": \"record\"," +
" \"name\": \"WeatherRecord\"," +
" \"doc\": \"A weather reading.\"," +
" \"fields\": [" +
" {\"name\": \"year\", \"type\": \"int\"}," +
" {\"name\": \"temperature\", \"type\": \"int\"}," +
" {\"name\": \"stationId\", \"type\": \"string\"}" +
" ]" +
"}"
);
public static class MaxTemperatureMapper
extends Mapper < LongWritable , Text , AvroKey < Integer >,
AvroValue < GenericRecord >> {
Search WWH ::




Custom Search