Working with Datasets - Python and HDF5

Databases Reference

In-Depth Information

The following steps take place:

1. h5py figures out the size of the selection, and determines whether it is compatible

with the size of the array being assigned.

2. HDF5 makes an appropriately sized selection on the dataset.

3. HDF5 reads from the input array and writes to the file.

All of the overhead involved in figuring out the slice sizes and so on, still applies. Writing

to a dataset one element at a time, or even a few elements at a time, is a great way to get

poor performance.

Start-Stop-Step Indexing

h5py uses a subset of the plain-vanilla slicing available in NumPy. This is the most

familiar form, consisting of up to three indices providing a start, stop, and step.

For example, let's create a simple 10-element dataset with increasing values:

>>> dset = f . create_dataset ( 'range' , data = np . arange ( 10 ))

>>> dset [ ... ]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

One index picks a particular element:

>>> dset [ 4 ]

4

Two indices specify a range, ending just before the last index:

>>> dset [ 4 : 8 ]

array([4,5,6,7])

Three indices provide a “step,” or pitch, telling how many elements to skip:

>>> dset [ 4 : 8 : 2 ]

array([4,6])

And of course you can get all the points by simply providing : , like this:

>>> dset [:]

array([0,1,2,3,4,5,6,7,8,9])

Like NumPy, you are allowed to use negative numbers to “count back” from the end of

the dataset, with -1 referring to the last element:

>>> dset [ 4 : - 1 ]

array([4,5,6,7,8])

Unlike NumPy, you can't pull fancy tricks with the indices. For example, the traditional

way to reverse an array in NumPy is this:

Search WWH ::

Custom Search

Home