Database Reference
In-Depth Information
In-Memory Serialization and Deserialization
Avro provides APIs for serialization and deserialization that are useful when you want to
integrate Avro with an existing system, such as a messaging system where the framing
format is already defined. In other cases, consider using Avro's datafile format.
Let's write a Java program to read and write Avro data from and to streams. We'll start with
a simple Avro schema for representing a pair of strings as a record:
{
"type"
:
"record"
,
"name"
:
"StringPair"
,
"doc"
:
"A pair of strings."
,
"fields"
: [
{
"name"
:
"left"
,
"type"
:
"string"
},
{
"name"
:
"right"
,
"type"
:
"string"
}
]
}
If this schema is saved in a file on the classpath called
StringPair.avsc
(
.avsc
is the conven-
tional extension for an Avro schema), we can load it using the following two lines of code:
Schema
.
Parser
parser
=
new
Schema
.
Parser
();
Schema schema
=
parser
.
parse
(
getClass
().
getResourceAsStream
(
"StringPair.avsc"
));
We can create an instance of an Avro record using the Generic API as follows:
GenericRecord datum
=
new
GenericData
.
Record
(
schema
);
datum
.
put
(
"left"
,
"L"
);
datum
.
put
(
"right"
,
"R"
);
Next, we serialize the record to an output stream:
ByteArrayOutputStream out
=
new
ByteArrayOutputStream
();
DatumWriter
<
GenericRecord
>
writer
=
new
GenericDatumWriter
<
GenericRecord
>(
schema
);
Encoder encoder
=
EncoderFactory
.
get
().
binaryEncoder
(
out
,
null
);
writer
.
write
(
datum
,
encoder
);
encoder
.
flush
();
out
.
close
();
There are two important objects here: the
DatumWriter
and the
Encoder
. A
DatumWriter
translates data objects into the types understood by an
Encoder
, which
the latter writes to the output stream. Here we are using a
GenericDatumWriter
,