Avro - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Schema Resolution

We can choose to use a different schema for reading the data back (the reader's schema )

from the one we used to write it (the writer's schema ). This is a powerful tool because it

enables schema evolution. To illustrate, consider a new schema for string pairs with an ad-

ded description field:

{

"type" : "record" ,

"name" : "StringPair" ,

"doc" : "A pair of strings with an added field." ,

"fields" : [

{ "name" : "left" , "type" : "string" },

{ "name" : "right" , "type" : "string" },

{ "name" : "description" , "type" : "string" , "default" : "" }

]

}

We can use this schema to read the data we serialized earlier because, crucially, we have

given the description field a default value (the empty string), [ 82 ] which Avro will use

when there is no such field defined in the records it is reading. Had we omitted the de-

fault attribute, we would get an error when trying to read the old data.

NOTE

To make the default value null rather than the empty string, we would instead define the descrip-

tion field using a union with the null Avro type:

{ "name" : "description" , "type" : [ "null" , "string" ], "default" : null }

When the reader's schema is different from the writer's, we use the constructor for Gen-

ericDatumReader that takes two schema objects, the writer's and the reader's, in that

order:

DatumReader < GenericRecord > reader =

new GenericDatumReader < GenericRecord >( schema , newSchema );

Decoder decoder =

DecoderFactory . get (). binaryDecoder ( out . toByteArray (),

null );

GenericRecord result = reader . read ( null , decoder );

assertThat ( result . get ( "left" ). toString (), is ( "L" ));

assertThat ( result . get ( "right" ). toString (), is ( "R" ));

assertThat ( result . get ( "description" ). toString (), is ( "" ));

Search WWH ::

Custom Search

Home