Databases Reference
In-Depth Information
• Integer (1, 2, 4, 8 byte; signed/unsigned) and floating-point (4/8 byte) types only
• Fast compression and decompression
• A decompressor that is almost always available
LZF Compression
For files you'll only be using from Python, LZF is a good choice. It ships with h5py; C
source code is available for third-party programs under the BSD license. It's optimized
for very, very fast compression at the expense of a lower compression ratio compared
to GZIP. The best use case for this is if your dataset has large numbers of redundant data
points. There are no compression_opts for this filter.
>>> dset = myfile . create_dataset ( "Dataset4" , ( 1000 ,), compression = "lzf" )
LZF compression:
• Works with all HDF5 types
• Features fast compression and decompression
• Is only available in Python (ships with h5py); C source available
Performance
As always, you should run your own performance tests to see what parts of your appli‐
cation would benefit from attention. However, here are some examples to give you an
idea of how the various filters stack up. In this experiment (see h5py.org/lzf for details),
a 4 MB dataset of single-precision floats was tested against the LZF, GZIP, and SZIP
compressors. A 190 KiB chunk size was used.
First, the data elements were assigned their own indices (see Table 4-1 ):
>>> data [ ... ] = np . arange ( 1024000 )
Table 4-1. Compression of trivial data
Compressor
Compression time (ms)
Decompression time (ms)
Compressed by
None
10.7
6.5
0.00%
LZF
18.6
17.8
96.66%
GZIP
58.1
40.5
98.53%
SZIP
63.1
61.3
72.68%
Next, a sine wave with added noise was tested (see Table 4-2 ):
>>> data [ ... ] = np . sin ( np . arange ( 1024000 ) / 32. ) + ( np . random ( 1024000 ) * 0.5 - 0.25 )
 
Search WWH ::




Custom Search