Databases Reference
In-Depth Information
Figure 4-3. HDF5 data pipeline, showing a dataset with GZIP and SHUFFLE filters ap‐
plied
Compression Filters
A number of compression filters are available in HDF5. By far the most commonly used
is the GZIP filter. (You'll also hear this referred to as the “DEFLATE” filter; in the HDF5
world both names are used for the same filter.)
Here's an example of GZIP compression used on a floating-point dataset:
>>> dset = f . create_dataset ( "BigDataset" , ( 1000 , 1000 ), dtype = 'f' , compres
sion = "gzip" )
>>> dset . compression
'gzip'
By the way, you're not limited to floats. The great thing about GZIP compression is that
it works with all fixed-width HDF5 types, not just numeric types.
Compression is transparent; data is read and written normally:
>>> dset [ ... ] = 42.0
>>> dset [ 0 , 0 ]
42.0
Investigating the Dataset object, we find a few more properties:
>>> dset . compression_opts
4
>>> dset . chunks
(63, 125)
The compression_opts property (and corresponding keyword to create_dataset ) re‐
flects any settings for the compression filter. In this case, the default GZIP level is 4.
 
Search WWH ::




Custom Search