Avro - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

Avro in Other Languages

For languages and frameworks other than Java, there are a few choices for working with

Avro data.

AvroAsTextInputFormat is designed to allow Hadoop Streaming programs to read

Avro datafiles. Each datum in the file is converted to a string, which is the JSON represent-

ation of the datum, or just to the raw bytes if the type is Avro bytes . Going the other way,

you can specify AvroTextOutputFormat as the output format of a Streaming job to

create Avro datafiles with a bytes schema, where each datum is the tab-delimited key-

value pair written from the Streaming output. Both of these classes can be found in the

org.apache.avro.mapred package.

It's also worth considering other frameworks like Pig, Hive, Crunch, and Spark for doing

Avro processing, since they can all read and write Avro datafiles by specifying the appro-

priate storage formats. See the relevant chapters in this topic for details.

[ 79 ] Named after the British aircraft manufacturer from the 20th century.

[ 80 ] Avro also performs favorably compared to other serialization libraries, as the benchmarks demonstrate.

[ 81 ] Avro can be downloaded in both source and binary forms . Get usage instructions for the Avro tools by

typing java -jar avro-tools-*.jar .

[ 82 ] Default values for fields are encoded using JSON. See the Avro specification for a description of this en-

coding for each data type.

[ 83 ] A useful consequence of this property is that you can compute an Avro datum's hash code from either the

object or the binary representation (the latter by using the static hashCode() method on BinaryData )

and get the same result in both cases.

[ 84 ] For an example that uses the Specific mapping with generated classes, see the AvroSpe-

cificMaxTemperature class in the example code.

[ 85 ] If we had used the identity mapper and reducer here, the program would sort and remove duplicate keys

at the same time. We encounter this idea of duplicating information from the key in the value object again in

Secondary Sort .

Search WWH ::

Custom Search

Home