Databases Reference
In-Depth Information
You'll notice that the auto-chunker has selected a chunk shape for us: (63, 125) . Data
is broken up into chunks of 63*125*(4 bytes) = 30KiB blocks for the compressor.
The following sections cover some of the available compression filters, and details about
each.
Lots of filters exist for HDF5, and lots more are on the way. If you're
archiving data or sharing it with people, it's best to limit yourself to the
plain-vanilla GZIP, SHUFFLE, and FLETCHER32 filters, since they are
included with HDF5 itself.
GZIP/DEFLATE Compression
As we just saw, GZIP compression is by far the simplest and most portable compressor
in HDF5. It ships with every installation of HDF5, and has the following benefits:
• Works with all HDF5 types
• Built into HDF5 and available everywhere
• Moderate to slow speed compression
• Performance can be improved by also using SHUFFLE (see “SHUFFLE Filter” on
page 52 )
For the GZIP compressor, compression_opts may be an integer from 0 to 9, with a
default of 4.
>>> dset = f . create_dataset ( "Dataset" , ( 1000 ,), compression = "gzip" )
It's also invoked if you specify a number as the argument to compression :
>>> dset = f . create_dataset ( "Dataset2" , ( 1000 ,), compression = 9 )
>>> dset . compression
'gzip'
>>> dset . compression_opts
9
SZIP Compression
SZIP is a patented compression technology used extensively by NASA. Generally you
only have to worry about this if you're exchanging files with people who use satellite
data. Because of patent licensing restrictions, many installations of HDF5 have the
compressor (but not the decompressor) disabled.
>>> dset = myfile . create_dataset ( "Dataset3" , ( 1000 ,), compression = "szip" )
SZIP features:
Search WWH ::




Custom Search