Database Reference
In-Depth Information
For datafiles, which have the writer's schema stored in the metadata, we only need to spe-
cify the reader's schema explicitly, which we can do by passing
null
for the writer's
schema:
DatumReader
<
GenericRecord
>
reader
=
new
GenericDatumReader
<
GenericRecord
>(
null
,
newSchema
);
Another common use of a different reader's schema is to drop fields in a record, an opera-
tion called
projection
. This is useful when you have records with a large number of fields
and you want to read only some of them. For example, this schema can be used to get
only the
right
field of a
StringPair
:
{
"type"
:
"record"
,
"name"
:
"StringPair"
,
"doc"
:
"The right field of a pair of strings."
,
"fields"
: [
{
"name"
:
"right"
,
"type"
:
"string"
}
]
}
The rules for schema resolution have a direct bearing on how schemas may evolve from
one version to the next, and are spelled out in the Avro specification for all Avro types. A
summary of the rules for record evolution from the point of view of readers and writers
(or servers and clients) is presented in
Table 12-4
.
Table 12-4. Schema resolution of records
New
schema
Writer Reader Action
Added
field
Old New The reader uses the default value of the new field, since it is not written by the
writer.
New Old
The reader does not know about the new field written by the writer, so it is ig-
nored (projection).
Removed
field
Old New The reader ignores the removed field (projection).
New Old
The removed field is not written by the writer. If the old schema had a default
defined for the field, the reader uses this; otherwise, it gets an error. In this
case, it is best to update the reader's schema, either at the same time as or be-
fore the writer's.
Another useful technique for evolving Avro schemas is the use of name
aliases
. Aliases
allow you to use different names in the schema used to read the Avro data than in the