Database Reference
In-Depth Information
The following snippet of code shows how to create a Parquet file and write a message to
it. The write() method would normally be called in a loop to write multiple messages
to the file, but this only writes one here:
Configuration conf = new Configuration ();
Path path = new Path ( "data.parquet" );
GroupWriteSupport writeSupport = new GroupWriteSupport ();
GroupWriteSupport . setSchema ( schema , conf );
ParquetWriter < Group > writer = new ParquetWriter < Group >( path ,
writeSupport ,
ParquetWriter . DEFAULT_COMPRESSION_CODEC_NAME ,
ParquetWriter . DEFAULT_BLOCK_SIZE ,
ParquetWriter . DEFAULT_PAGE_SIZE ,
ParquetWriter . DEFAULT_PAGE_SIZE , /* dictionary page size */
ParquetWriter . DEFAULT_IS_DICTIONARY_ENABLED ,
ParquetWriter . DEFAULT_IS_VALIDATING_ENABLED ,
ParquetProperties . WriterVersion . PARQUET_1_0 , conf );
writer . write ( group );
writer . close ();
The ParquetWriter constructor needs to be provided with a WriteSupport in-
stance, which defines how the message type is translated to Parquet's types. In this case,
we are using the Group message type, so GroupWriteSupport is used. Notice that
the Parquet schema is set on the Configuration object by calling the setSchema()
static method on GroupWriteSupport , and then the Configuration object is
passed to ParquetWriter . This example also illustrates the Parquet file properties that
may be set, corresponding to the ones listed in Table 13-3 .
Reading a Parquet file is simpler than writing one, since the schema does not need to be
specified as it is stored in the Parquet file. (It is, however, possible to set a read schema to
return a subset of the columns in the file, via projection.) Also, there are no file properties
to be set since they are set at write time:
GroupReadSupport readSupport = new GroupReadSupport ();
ParquetReader < Group > reader = new ParquetReader < Group >( path ,
readSupport );
ParquetReader has a read() method to read the next message. It returns null
when the end of the file is reached:
Group result = reader . read ();
assertNotNull ( result );
assertThat ( result . getString ( "left" , 0 ), is ( "L" ));
assertThat ( result . getString ( "right" , 0 ), is ( "R" ));
assertNull ( reader . read ());
Search WWH ::




Custom Search