Working with Datasets - Python and HDF5

Databases Reference

In-Depth Information

How can we get the value itself, without it being wrapped in a NumPy array? It turns

out there's another way to slice into NumPy arrays (and Dataset objects). You can index

with a somewhat bizarre-looking empty tuple:

>>> dset [()]

42

So keep these in your toolkit:

1. Using Ellipsis gives you all the elements in the dataset (always as an array object).

2. Using an empty tuple " () " gives you all the elements in the dataset, as an array object

for 1D and higher datasets, and as a scalar element for 0D datasets.

To make things even more confusing, you may see code in the wild

that uses the .value attribute of a dataset. This is a historical wart that

is exactly equivalent to doing dataset[()] . It's long deprecated and

not available in modern versions of h5py.

Boolean Indexing

In an earlier example, we used an interesting expression to set negative entries in a

NumPy array val to zero:

>>> val [ val < 0 ] = 0

This is an idiom in NumPy made possible by Boolean-array indexing . If val is a NumPy

array of integers, then the result of the expression val < 0 is an array of Booleans . Its

entries are True where the corresponding elements of val are negative, and False

elsewhere. In the NumPy world, this is also known as a mask .

Crucially, in both the NumPy and HDF5 worlds, you can use a Boolean array as an

indexing expression. This does just what you'd expect; it selects the dataset elements

where the corresponding index entries are True , and de-selects the rest.

In the spirit of the previous example, let's suppose we have a dataset initialized to a set

of random numbers distributed between -1 and 1:

>>> data = np . random . random ( 10 ) * 2 - 1

>>> data

array([ 0.98885498, -0.28554781, -0.17157685, -0.05227003, 0.66211931,

0.45692186, 0.07123649, -0.40374417, 0.22059144, -0.82367672])

>>> dset = f . create_dataset ( 'random' , data = data )

Let's clip the negative values to 0, by using a Boolean array:

>>> dset [ data < 0 ] = 0

>>> dset [ ... ]

Search WWH ::

Custom Search

Home