Working with Datasets - Python and HDF5

Databases Reference

In-Depth Information

the data across all 100 remaining indices. It's as efficient as you can get; there's only one

slicing operation, and the remainder of the time is spent writing data to disk.

Reading Directly into an Existing Array

Finally we come full circle back to read_direct , one of the most powerful methods

available on the Dataset object. It's as close as you can get to the “traditional” C interface

of HDF5, without getting into the internal details of h5py.

To recap, you can use read_direct to have HDF5 “fill in” data into an existing array,

automatically performing type conversion. Last time we saw how to read float32 data

into a float64 NumPy array:

>>> dset . dtype

dtype('float32')

>>> out = np . empty (( 100 , 1000 ), dtype = np . float64 )

>>> dset . read_direct ( out )

This works, but requires you to read the entire dataset in one go. Let's pick a more useful

example. Suppose we wanted to read the first time trace, at dset[0,:] , and deposit it

into the out array at out[50,:] . We can use the source_sel and dest_sel keywords,

for source selection and destination selection respectively:

>>> dset . read_direct ( out , source_sel = np . s_ [ 0 ,:], dest_sel = np . s_ [ 50 ,:])

The odd-looking np.s_ is a gadget that takes slices, in the ordinary array-slicing syntax,

and returns a NumPy slice object with the corresponding information.

By the way, you don't have to match the shape of your output array to the dataset. Suppose

our application wanted to compute the mean of the first 50 data points in each time

trace, a common scenario when estimating DC offsets in real-world experimental data.

You could do this using the standard slicing techniques:

>>> out = dset [:, 0 : 50 ]

>>> out . shape

(100, 50)

>>> means = out . mean ( axis = 1 )

>>> means . shape

(100,)

Using read_direct this would look like:

>>> out = np . empty (( 100 , 50 ), dtype = np . float32 )

>>> dset . read_direct ( out , np . s_ [:, 0 : 50 ]) # dest_sel can be omitted

>>> means = out . mean ( axis = 1 )

This may seem like a trivial case, but there's an important difference between the two

approaches. In the first example, the out array is created internally by h5py, used to

store the slice, and then thrown away. In the second example, out is allocated by the

user, and can be reused for future calls to read_direct .

Search WWH ::

Custom Search

Home