Databases Reference
In-Depth Information
How can we get the value itself, without it being wrapped in a NumPy array? It turns
out there's another way to slice into NumPy arrays (and
Dataset
objects). You can index
with a somewhat bizarre-looking empty tuple:
>>>
dset
[()]
42
So keep these in your toolkit:
1. Using
Ellipsis
gives you all the elements in the dataset (always as an array object).
2. Using an empty tuple "
()
" gives you all the elements in the dataset, as an array object
for 1D and higher datasets, and as a scalar element for 0D datasets.
To make things even more confusing, you may see code in the wild
that uses the
.value
attribute of a dataset. This is a historical wart that
is exactly equivalent to doing
dataset[()]
. It's long deprecated and
not available in modern versions of h5py.
Boolean Indexing
In an earlier example, we used an interesting expression to set negative entries in a
NumPy array
val
to zero:
>>>
val
[
val
<
0
]
=
0
This is an idiom in NumPy made possible by
Boolean-array indexing
. If
val
is a NumPy
array of integers, then the result of the expression
val < 0
is
an array of Booleans
. Its
entries are
True
where the corresponding elements of
val
are negative, and
False
elsewhere. In the NumPy world, this is also known as a
mask
.
Crucially, in both the NumPy and HDF5 worlds, you can use a Boolean array as an
indexing expression. This does just what you'd expect; it selects the dataset elements
where the corresponding index entries are
True
, and de-selects the rest.
In the spirit of the previous example, let's suppose we have a dataset initialized to a set
of random numbers distributed between -1 and 1:
>>>
data
=
np
.
random
.
random
(
10
)
*
2
-
1
>>>
data
array([ 0.98885498, -0.28554781, -0.17157685, -0.05227003, 0.66211931,
0.45692186, 0.07123649, -0.40374417, 0.22059144, -0.82367672])
>>>
dset
=
f
.
create_dataset
(
'random'
,
data
=
data
)
Let's clip the negative values to 0, by using a Boolean array:
>>>
dset
[
data
<
0
]
=
0
>>>
dset
[
...
]