Database Reference
In-Depth Information
Interoperability
To demonstrate Avro's language interoperability, let's write a datafile using one language
(Python) and read it back with another (Java).
Python API
The program in
Example 12-1
reads comma-separated strings from standard input and
writes them as
StringPair
records to an Avro datafile. Like in the Java code for writing
a datafile, we create a
DatumWriter
and a
DataFileWriter
object. Notice that we
have embedded the Avro schema in the code, although we could equally well have read it
from a file.
Python represents Avro records as dictionaries; each line that is read from standard in is
turned into a
dict
object and appended to the
DataFileWriter
.
Example 12-1. A Python program for writing Avro record pairs to a datafile
import
os
import
string
import
sys
from
avro
import
schema
from
avro
import
io
from
avro
import
datafile
if
__name__
==
'__main__'
:
if
len
(
sys
.
argv
) !=
2
:
sys
.
exit
(
'Usage:
%s
<data_file>'
%
sys
.
argv
[
0
])
avro_file
=
sys
.
argv
[
1
]
writer
=
open
(
avro_file
,
'wb'
)
datum_writer
=
io
.
DatumWriter
()
schema_object
=
schema
.
parse
(
"
\
{ "
type
": "
record
",
"name"
:
"StringPair"
,
"doc"
:
"A pair of strings."
,
"fields"
: [
{
"name"
:
"left"
,
"type"
:
"string"
},
{
"name"
:
"right"
,
"type"
:
"string"
}
]
}
")
dfw
=
datafile
.
DataFileWriter
(
writer
,
datum_writer
,
schema_object
)
for
line
in
sys
.
stdin
.
readlines
():
(
left
,
right
) =
string
.
split
(
line
.
strip
(),
','
)