Database Reference
In-Depth Information
Schema Resolution
We can choose to use a different schema for reading the data back (the
reader's schema
)
from the one we used to write it (the
writer's schema
). This is a powerful tool because it
enables schema evolution. To illustrate, consider a new schema for string pairs with an ad-
ded
description
field:
{
"type"
:
"record"
,
"name"
:
"StringPair"
,
"doc"
:
"A pair of strings with an added field."
,
"fields"
: [
{
"name"
:
"left"
,
"type"
:
"string"
},
{
"name"
:
"right"
,
"type"
:
"string"
},
{
"name"
:
"description"
,
"type"
:
"string"
,
"default"
:
""
}
]
}
We can use this schema to read the data we serialized earlier because, crucially, we have
when there is no such field defined in the records it is reading. Had we omitted the
de-
fault
attribute, we would get an error when trying to read the old data.
NOTE
To make the default value
null
rather than the empty string, we would instead define the
descrip-
tion
field using a union with the
null
Avro type:
{
"name"
:
"description"
,
"type"
: [
"null"
,
"string"
],
"default"
:
null
}
When the reader's schema is different from the writer's, we use the constructor for
Gen-
ericDatumReader
that takes two schema objects, the writer's and the reader's, in that
order:
DatumReader
<
GenericRecord
>
reader
=
new
GenericDatumReader
<
GenericRecord
>(
schema
,
newSchema
);
Decoder decoder
=
DecoderFactory
.
get
().
binaryDecoder
(
out
.
toByteArray
(),
null
);
GenericRecord result
=
reader
.
read
(
null
,
decoder
);
assertThat
(
result
.
get
(
"left"
).
toString
(),
is
(
"L"
));
assertThat
(
result
.
get
(
"right"
).
toString
(),
is
(
"R"
));
assertThat
(
result
.
get
(
"description"
).
toString
(),
is
(
""
));