Database Reference
In-Depth Information
For datafiles, which have the writer's schema stored in the metadata, we only need to spe-
cify the reader's schema explicitly, which we can do by passing null for the writer's
schema:
DatumReader < GenericRecord > reader =
new GenericDatumReader < GenericRecord >( null , newSchema );
Another common use of a different reader's schema is to drop fields in a record, an opera-
tion called projection . This is useful when you have records with a large number of fields
and you want to read only some of them. For example, this schema can be used to get
only the right field of a StringPair :
{
"type" : "record" ,
"name" : "StringPair" ,
"doc" : "The right field of a pair of strings." ,
"fields" : [
{ "name" : "right" , "type" : "string" }
]
}
The rules for schema resolution have a direct bearing on how schemas may evolve from
one version to the next, and are spelled out in the Avro specification for all Avro types. A
summary of the rules for record evolution from the point of view of readers and writers
(or servers and clients) is presented in Table 12-4 .
Table 12-4. Schema resolution of records
New
schema
Writer Reader Action
Added
field
Old New The reader uses the default value of the new field, since it is not written by the
writer.
New Old
The reader does not know about the new field written by the writer, so it is ig-
nored (projection).
Removed
field
Old New The reader ignores the removed field (projection).
New Old
The removed field is not written by the writer. If the old schema had a default
defined for the field, the reader uses this; otherwise, it gets an error. In this
case, it is best to update the reader's schema, either at the same time as or be-
fore the writer's.
Another useful technique for evolving Avro schemas is the use of name aliases . Aliases
allow you to use different names in the schema used to read the Avro data than in the
Search WWH ::




Custom Search