Database Reference
In-Depth Information
The following snippet of code shows how to create a Parquet file and write a message to
it. The
write()
method would normally be called in a loop to write multiple messages
to the file, but this only writes one here:
Configuration conf
=
new
Configuration
();
Path path
=
new
Path
(
"data.parquet"
);
GroupWriteSupport writeSupport
=
new
GroupWriteSupport
();
GroupWriteSupport
.
setSchema
(
schema
,
conf
);
ParquetWriter
<
Group
>
writer
=
new
ParquetWriter
<
Group
>(
path
,
writeSupport
,
ParquetWriter
.
DEFAULT_COMPRESSION_CODEC_NAME
,
ParquetWriter
.
DEFAULT_BLOCK_SIZE
,
ParquetWriter
.
DEFAULT_PAGE_SIZE
,
ParquetWriter
.
DEFAULT_PAGE_SIZE
,
/* dictionary page size */
ParquetWriter
.
DEFAULT_IS_DICTIONARY_ENABLED
,
ParquetWriter
.
DEFAULT_IS_VALIDATING_ENABLED
,
ParquetProperties
.
WriterVersion
.
PARQUET_1_0
,
conf
);
writer
.
write
(
group
);
writer
.
close
();
The
ParquetWriter
constructor needs to be provided with a
WriteSupport
in-
stance, which defines how the message type is translated to Parquet's types. In this case,
we are using the
Group
message type, so
GroupWriteSupport
is used. Notice that
the Parquet schema is set on the
Configuration
object by calling the
setSchema()
static method on
GroupWriteSupport
, and then the
Configuration
object is
passed to
ParquetWriter
. This example also illustrates the Parquet file properties that
may be set, corresponding to the ones listed in
Table 13-3
.
Reading a Parquet file is simpler than writing one, since the schema does not need to be
specified as it is stored in the Parquet file. (It is, however, possible to set a
read schema
to
return a subset of the columns in the file, via projection.) Also, there are no file properties
to be set since they are set at write time:
GroupReadSupport readSupport
=
new
GroupReadSupport
();
ParquetReader
<
Group
>
reader
=
new
ParquetReader
<
Group
>(
path
,
readSupport
);
ParquetReader
has a
read()
method to read the next message. It returns
null
when the end of the file is reached:
Group result
=
reader
.
read
();
assertNotNull
(
result
);
assertThat
(
result
.
getString
(
"left"
,
0
),
is
(
"L"
));
assertThat
(
result
.
getString
(
"right"
,
0
),
is
(
"R"
));
assertNull
(
reader
.
read
());