Parquet - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

In order to use a projection schema, set it on the configuration using the setReques-

tedProjection() static convenience method on AvroReadSupport :

Schema projectionSchema = parser . parse (

getClass (). getResourceAsStream ( "ProjectedStringPair.avsc" ));

Configuration conf = new Configuration ();

AvroReadSupport . setRequestedProjection ( conf , projectionSchema );

Then pass the configuration into the constructor for AvroParquetReader :

AvroParquetReader < GenericRecord > reader =

new AvroParquetReader < GenericRecord >( conf , path );

GenericRecord result = reader . read ();

assertNull ( result . get ( "left" ));

assertThat ( result . get ( "right" ). toString (), is ( "R" ));

Both the Protocol Buffers and Thrift implementations support projection in a similar man-

ner. In addition, the Avro implementation allows you to specify a reader's schema by call-

ing setReadSchema() on AvroReadSupport . This schema is used to resolve Avro

records according to the rules listed in Table 12-4 .

The reason that Avro has both a projection schema and a reader's schema is that the pro-

jection must be a subset of the schema used to write the Parquet file, so it cannot be used

to evolve a schema by adding new fields.

The two schemas serve different purposes, and you can use both together. The projection

schema is used to filter the columns to read from the Parquet file. Although it is expressed

as an Avro schema, it can be viewed simply as a list of Parquet columns to read back. The

reader's schema, on the other hand, is used only to resolve Avro records. It is never trans-

lated to a Parquet schema, since it has no bearing on which columns are read from the

Parquet file. For example, if we added a description field to our Avro schema (like in

Schema Resolution ) and used it as the Avro reader's schema, then the records would con-

tain the default value of the field, even though the Parquet file has no such field.

Search WWH ::

Custom Search

Home