Avro - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

which passes the fields of GenericRecord to the Encoder . We pass a null to the

encoder factory because we are not reusing a previously constructed encoder here.

In this example, only one object is written to the stream, but we could call write() with

more objects before closing the stream if we wanted to.

The GenericDatumWriter needs to be passed the schema because it follows the

schema to determine which values from the data objects to write out. After we have called

the writer's write() method, we flush the encoder, then close the output stream.

We can reverse the process and read the object back from the byte buffer:

DatumReader < GenericRecord > reader =

new GenericDatumReader < GenericRecord >( schema );

Decoder decoder =

DecoderFactory . get (). binaryDecoder ( out . toByteArray (),

null );

GenericRecord result = reader . read ( null , decoder );

assertThat ( result . get ( "left" ). toString (), is ( "L" ));

assertThat ( result . get ( "right" ). toString (), is ( "R" ));

We pass null to the calls to binaryDecoder() and read() because we are not re-

using objects here (the decoder or the record, respectively).

The objects returned by result.get("left") and result.get("left") are of

type Utf8 , so we convert them into Java String objects by calling their toString()

methods.

The Specific API

Let's look now at the equivalent code using the Specific API. We can generate the

StringPair class from the schema file by using Avro's Maven plug-in for compiling

schemas. The following is the relevant part of the Maven Project Object Model (POM):

...

<build>

<groupId> org.apache.avro </groupId>

<artifactId> avro-maven-plugin </artifactId>

<version> ${avro.version} </version>

<id> schemas </id>

<phase> generate-sources </phase>

Search WWH ::

Custom Search

Home