Databases Reference
In-Depth Information
array([[0, 0],
[0, 0]])
For some applications, it's nice to pick a default value other than 0. You might want to
set unmodified elements to -1, or even NaN for floating-point datasets.
HDF5 addresses this with a fill value , which is the value returned for the areas of a dataset
that haven't been written to. Fill values are handled when data is read, so they don't cost
you anything in terms of storage space. They're defined when the dataset is created, and
can't be changed:
>>> dset = f . create_dataset ( 'filled' , ( 2 , 2 ), dtype = np . int32 , fillvalue = 42 )
>>> dset [ ... ]
array([[42, 42],
[42, 42]])
A dataset's fill value is available on the fillvalue property:
>>> dset . fillvalue
42
Reading and Writing Data
Your main day-to-day interaction with Dataset objects will look a lot like your inter‐
actions with NumPy arrays. One of the design goals for the h5py package was to “recycle”
as many NumPy metaphors as possible for datasets, so that you can interact with them
in a familiar way.
Even if you're an experienced NumPy user, don't skip this section! There are important
performance differences and implementation subtleties between the two that may trip
you up.
Before we dive into the nuts and bolts of reading from and writing to datasets, it's
important to spend a few minutes discussing how Dataset objects aren't like NumPy
arrays, especially from a performance perspective.
Using Slicing Effectively
In order to use Dataset objects efficiently, we have to know a little about what goes on
behind the scenes. Let's take the example of reading from an existing dataset. Suppose
we have the (100, 1000)-shape array from the previous example:
>>> dset = f2 [ 'big' ]
>>> dset
<HDF5 dataset "big": shape (100, 1000), type "<f4">
Now we request a slice:
Search WWH ::




Custom Search