How Chunking and Compression Can Help You - Python and HDF5

Databases Reference

In-Depth Information

You'll notice that the auto-chunker has selected a chunk shape for us: (63, 125) . Data

is broken up into chunks of 63*125*(4 bytes) = 30KiB blocks for the compressor.

The following sections cover some of the available compression filters, and details about

each.

Lots of filters exist for HDF5, and lots more are on the way. If you're

archiving data or sharing it with people, it's best to limit yourself to the

plain-vanilla GZIP, SHUFFLE, and FLETCHER32 filters, since they are

included with HDF5 itself.

GZIP/DEFLATE Compression

As we just saw, GZIP compression is by far the simplest and most portable compressor

in HDF5. It ships with every installation of HDF5, and has the following benefits:

• Works with all HDF5 types

• Built into HDF5 and available everywhere

• Moderate to slow speed compression

• Performance can be improved by also using SHUFFLE (see “SHUFFLE Filter” on

page 52 )

For the GZIP compressor, compression_opts may be an integer from 0 to 9, with a

default of 4.

>>> dset = f . create_dataset ( "Dataset" , ( 1000 ,), compression = "gzip" )

It's also invoked if you specify a number as the argument to compression :

>>> dset = f . create_dataset ( "Dataset2" , ( 1000 ,), compression = 9 )

>>> dset . compression

'gzip'

>>> dset . compression_opts

9

SZIP Compression

SZIP is a patented compression technology used extensively by NASA. Generally you

only have to worry about this if you're exchanging files with people who use satellite

data. Because of patent licensing restrictions, many installations of HDF5 have the

compressor (but not the decompressor) disabled.

>>> dset = myfile . create_dataset ( "Dataset3" , ( 1000 ,), compression = "szip" )

SZIP features:

Search WWH ::

Custom Search

Home