Databases Reference
In-Depth Information
CHAPTER 4
How Chunking and Compression
Can Help You
So far we have avoided talking about exactly how the data you write is stored on disk.
Some of the most interesting features in HDF5, including per-dataset compression, are
tied up in the details of how data is arranged on disk.
Before we get down to the nuts and bolts, there's a more fundamental issue we have to
discuss: how multidimensional arrays are actually handled in Python and HDF5.
Contiguous Storage
Let's suppose we have a four-element NumPy array of strings:
>>> a = np . array ([ [ "A" , "B" ], [ "C" , "D" ] ])
>>> print a
[['A' 'B']
['C' 'D']]
Mathematically, this is a two-dimensional object. It has two axes, and can be indexed
using a pair of numbers in the range 0 to 1:
>>> a [ 1 , 1 ]
'D'
However, there's no such thing as “two-dimensional” computer memory, at least not in
common use. The elements are actually stored in a one-dimensional buffer:
'A' 'B' 'C' 'D'
This is called contiguous storage, because all the elements of the array, whether it's stored
on disk or in memory, are stored one after another. NumPy uses a simple set of rules to
turn an indexing expression into the appropriate offset into this one-dimensional buffer.
 
Search WWH ::




Custom Search