Database Reference
In-Depth Information
Avro in Other Languages
For languages and frameworks other than Java, there are a few choices for working with
Avro data.
AvroAsTextInputFormat
is designed to allow Hadoop Streaming programs to read
Avro datafiles. Each datum in the file is converted to a string, which is the JSON represent-
ation of the datum, or just to the raw bytes if the type is Avro
bytes
. Going the other way,
you can specify
AvroTextOutputFormat
as the output format of a Streaming job to
create Avro datafiles with a
bytes
schema, where each datum is the tab-delimited key-
value pair written from the Streaming output. Both of these classes can be found in the
org.apache.avro.mapred
package.
It's also worth considering other frameworks like Pig, Hive, Crunch, and Spark for doing
Avro processing, since they can all read and write Avro datafiles by specifying the appro-
priate storage formats. See the relevant chapters in this topic for details.
[
80
]
Avro also performs favorably compared to other serialization libraries, as the
benchmarks
demonstrate.
[
81
]
Avro can be downloaded in both
source and binary forms
.
Get usage instructions for the Avro tools by
typing
java -jar avro-tools-*.jar
.
[
82
]
Default values for fields are encoded using JSON. See the Avro specification for a description of this en-
coding for each data type.
[
83
]
A useful consequence of this property is that you can compute an Avro datum's hash code from either the
object or the binary representation (the latter by using the static
hashCode()
method on
BinaryData
)
and get the same result in both cases.
cificMaxTemperature
class in the example code.
[
85
]
If we had used the identity mapper and reducer here, the program would sort and remove duplicate keys
at the same time. We encounter this idea of duplicating information from the key in the value object again in