Databases Reference
In-Depth Information
the data across all 100 remaining indices. It's as efficient as you can get; there's only one
slicing operation, and the remainder of the time is spent writing data to disk.
Reading Directly into an Existing Array
Finally we come full circle back to read_direct , one of the most powerful methods
available on the Dataset object. It's as close as you can get to the “traditional” C interface
of HDF5, without getting into the internal details of h5py.
To recap, you can use read_direct to have HDF5 “fill in” data into an existing array,
automatically performing type conversion. Last time we saw how to read float32 data
into a float64 NumPy array:
>>> dset . dtype
dtype('float32')
>>> out = np . empty (( 100 , 1000 ), dtype = np . float64 )
>>> dset . read_direct ( out )
This works, but requires you to read the entire dataset in one go. Let's pick a more useful
example. Suppose we wanted to read the first time trace, at dset[0,:] , and deposit it
into the out array at out[50,:] . We can use the source_sel and dest_sel keywords,
for source selection and destination selection respectively:
>>> dset . read_direct ( out , source_sel = np . s_ [ 0 ,:], dest_sel = np . s_ [ 50 ,:])
The odd-looking np.s_ is a gadget that takes slices, in the ordinary array-slicing syntax,
and returns a NumPy slice object with the corresponding information.
By the way, you don't have to match the shape of your output array to the dataset. Suppose
our application wanted to compute the mean of the first 50 data points in each time
trace, a common scenario when estimating DC offsets in real-world experimental data.
You could do this using the standard slicing techniques:
>>> out = dset [:, 0 : 50 ]
>>> out . shape
(100, 50)
>>> means = out . mean ( axis = 1 )
>>> means . shape
(100,)
Using read_direct this would look like:
>>> out = np . empty (( 100 , 50 ), dtype = np . float32 )
>>> dset . read_direct ( out , np . s_ [:, 0 : 50 ]) # dest_sel can be omitted
>>> means = out . mean ( axis = 1 )
This may seem like a trivial case, but there's an important difference between the two
approaches. In the first example, the out array is created internally by h5py, used to
store the slice, and then thrown away. In the second example, out is allocated by the
user, and can be reused for future calls to read_direct .
Search WWH ::




Custom Search