Database Reference
In-Depth Information
Data Model
Parquet defines a small number of primitive types, listed in Table 13-1 .
Table 13-1. Parquet primitive types
Type
Description
Binary value
boolean
32-bit signed integer
int32
64-bit signed integer
int64
96-bit signed integer
int96
Single-precision (32-bit) IEEE 754 floating-point number
float
Double-precision (64-bit) IEEE 754 floating-point number
double
Sequence of 8-bit unsigned bytes
binary
fixed_len_byte_array Fixed number of 8-bit unsigned bytes
The data stored in a Parquet file is described by a schema, which has at its root a message
containing a group of fields. Each field has a repetition ( required , optional , or re-
peated ), a type, and a name. Here is a simple Parquet schema for a weather record:
message WeatherRecord {
required int32 year;
required int32 temperature;
required binary stationId (UTF8);
}
Notice that there is no primitive string type. Instead, Parquet defines logical types that spe-
cify how primitive types should be interpreted, so there is a separation between the serial-
ized representation (the primitive type) and the semantics that are specific to the applica-
tion (the logical type). Strings are represented as binary primitives with a UTF8 annota-
tion. Some of the logical types defined by Parquet are listed in Table 13-2 , along with a
representative example schema of each. Among those not listed in the table are signed in-
tegers, unsigned integers, more date/time types, and JSON and BSON document types. See
the Parquet specification for details.
Search WWH ::




Custom Search