Database Reference
In-Depth Information
Note that the
0
parameter passed to the
getString()
method specifies the index of the
field to retrieve, since fields may have repeated values.
Avro, Protocol Buffers, and Thrift
Most applications will prefer to define models using a framework like Avro, Protocol
Buffers, or Thrift, and Parquet caters to all of these cases. Instead of
ParquetWriter
and
ParquetReader
, use
AvroParquetWriter
,
ProtoParquetWriter
, or
ThriftParquetWriter
, and the respective reader classes. These classes take care of
translating between Avro, Protocol Buffers, or Thrift schemas and Parquet schemas (as
well as performing the equivalent mapping between the framework types and Parquet
types), which means you don't need to deal with Parquet schemas directly.
Let's repeat the previous example but using the Avro Generic API, just like we did in
In-
Memory Serialization and Deserialization
.
The Avro schema is:
{
"type"
:
"record"
,
"name"
:
"StringPair"
,
"doc"
:
"A pair of strings."
,
"fields"
: [
{
"name"
:
"left"
,
"type"
:
"string"
},
{
"name"
:
"right"
,
"type"
:
"string"
}
]
}
We create a schema instance and a generic record with:
Schema
.
Parser
parser
=
new
Schema
.
Parser
();
Schema schema
=
parser
.
parse
(
getClass
().
getResourceAsStream
(
"StringPair.avsc"
));
GenericRecord datum
=
new
GenericData
.
Record
(
schema
);
datum
.
put
(
"left"
,
"L"
);
datum
.
put
(
"right"
,
"R"
);
Then we can write a Parquet file:
Path path
=
new
Path
(
"data.parquet"
);
AvroParquetWriter
<
GenericRecord
>
writer
=
new
AvroParquetWriter
<
GenericRecord
>(
path
,
schema
);
writer
.
write
(
datum
);
writer
.
close
();
AvroParquetWriter
converts the Avro schema into a Parquet schema, and also trans-
lates each Avro
GenericRecord
instance into the corresponding Parquet types to write