Databases Reference
In-Depth Information
>>> dset2 = f . create_dataset ( 'timetraces4' , ( 5000 , 1000 ), maxshape = ( None , 1000 ),
chunks = ( 1 , 1000 ))
>>> timeit ( test1 , setup = setup , number = 1 )
3.0520598888397217
>>> timeit ( test2 , setup = setup , number = 1 )
2.5036721229553223
Much better. Next, we explore the “killer app” for chunks, arguably even more important
than their role in boosting performance: filters.
Filters and Compression
If you wanted to compress a contiguous dataset, you would quickly realize that the entire
thing would have to be decompressed and recompressed every time you wrote an ele‐
ment. That's because there's no simple way to index into a compressed dataset using
offsets, like you can with an uncompressed contiguous dataset. After all, the point of
compression is that you end up with a variable-sized output depending on the values
involved.
With chunking, it becomes possible to transparently perform compression on a dataset.
The initial size of each chunk is known, and since they're indexed by a B-tree they can
be stored anywhere in the file, not just one after another. In other words, each chunk is
free to grow or shrink without banging into the others.
The Filter Pipeline
HDF5 has the concept of a filter pipeline , which is just a series of operations performed
on each chunk when it's written. Each filter is free to do anything it wants to the data in
the chunk: compress it, checksum it, add metadata, anything. When the file is read, each
filter is run in “reverse” mode to reconstruct the original data.
Figure 4-3 shows schematically how this works. You'll notice that since the atomic unit
of data here is the chunk , reading or writing any data (even a single element) will result
in decompression of at least one entire chunk. This is one thing to keep in mind when
selecting a chunk shape, or choosing whether compression is right for your application.
Finally, you have to specify your filters when the dataset is created, and they can never
change.
Search WWH ::




Custom Search