Databases Reference
In-Depth Information
Figure 4-3. HDF5 data pipeline, showing a dataset with GZIP and SHUFFLE filters ap‐
plied
Compression Filters
A number of compression filters are available in HDF5. By far the most commonly used
is the GZIP filter. (You'll also hear this referred to as the “DEFLATE” filter; in the HDF5
world both names are used for the same filter.)
Here's an example of GZIP compression used on a floating-point dataset:
>>>
dset
=
f
.
create_dataset
(
"BigDataset"
,
(
1000
,
1000
),
dtype
=
'f'
,
compres
sion
=
"gzip"
)
>>>
dset
.
compression
'gzip'
By the way, you're not limited to floats. The great thing about GZIP compression is that
it works with all fixed-width HDF5 types, not just numeric types.
Compression is transparent; data is read and written normally:
>>>
dset
[
...
]
=
42.0
>>>
dset
[
0
,
0
]
42.0
Investigating the
Dataset
object, we find a few more properties:
>>>
dset
.
compression_opts
4
>>>
dset
.
chunks
(63, 125)
The
compression_opts
property (and corresponding keyword to
create_dataset
) re‐
flects any settings for the compression filter. In this case, the default GZIP level is 4.