Database Reference
In-Depth Information
Note that the 0 parameter passed to the getString() method specifies the index of the
field to retrieve, since fields may have repeated values.
Avro, Protocol Buffers, and Thrift
Most applications will prefer to define models using a framework like Avro, Protocol
Buffers, or Thrift, and Parquet caters to all of these cases. Instead of ParquetWriter
and ParquetReader , use AvroParquetWriter , ProtoParquetWriter , or
ThriftParquetWriter , and the respective reader classes. These classes take care of
translating between Avro, Protocol Buffers, or Thrift schemas and Parquet schemas (as
well as performing the equivalent mapping between the framework types and Parquet
types), which means you don't need to deal with Parquet schemas directly.
Let's repeat the previous example but using the Avro Generic API, just like we did in In-
Memory Serialization and Deserialization . The Avro schema is:
{
"type" : "record" ,
"name" : "StringPair" ,
"doc" : "A pair of strings." ,
"fields" : [
{ "name" : "left" , "type" : "string" },
{ "name" : "right" , "type" : "string" }
]
}
We create a schema instance and a generic record with:
Schema . Parser parser = new Schema . Parser ();
Schema schema =
parser . parse ( getClass (). getResourceAsStream ( "StringPair.avsc" ));
GenericRecord datum = new GenericData . Record ( schema );
datum . put ( "left" , "L" );
datum . put ( "right" , "R" );
Then we can write a Parquet file:
Path path = new Path ( "data.parquet" );
AvroParquetWriter < GenericRecord > writer =
new AvroParquetWriter < GenericRecord >( path , schema );
writer . write ( datum );
writer . close ();
AvroParquetWriter converts the Avro schema into a Parquet schema, and also trans-
lates each Avro GenericRecord instance into the corresponding Parquet types to write
Search WWH ::




Custom Search