Databases Reference
In-Depth Information
>>>
with
h5py
.
File
(
'bool.hdf5'
,
'w'
)
as
f2
:
...
f
.
create_dataset
(
'bool'
,
(
100
,),
dtype
=
np
.
bool
)
And now let's see how it looks in the file, again using
h5ls
:
Opened "bool.hdf5" with sec2 driver.
/ Group
Location: 1:96
Links: 1
/bool Dataset {100/100}
Location: 1:800
Links: 1
Storage: 100 logical bytes, 0 allocated bytes
Type: enum native signed char {
FALSE = 0
TRUE = 1
}
The array Type
Not often encountered in NumPy code, the
array
type is a good choice when you want
to store multiple values of the same type in a single element. Unlike compound types,
there are no separate “fields”; rather, each element is itself a multidimensional array.
There are a couple of pitfalls associated with this type and with some “helpful” behavior
from NumPy, which can be confusing. Let's start with an example, in which our elements
are 2×2 arrays of floats:
>>>
dt
=
np
.
dtype
(
'(2,2)f'
)
>>>
dt
dtype(('float32',(2, 2)))
Now let's create an HDF5 dataset with this dtype that has 100 data points:
>>>
dset
=
f
.
create_dataset
(
'array'
,
(
100
,),
dtype
=
dt
)
>>>
dset
.
dtype
dtype(('float32',(2, 2)))
>>>
dset
.
shape
(100,)
Retrieving a single element gives us a 2x2 NumPy array:
>>>
out
=
dset
[
0
]
>>>
out
array([[ 0., 0.],
[ 0., 0.]], dtype=float32)
You might have expected a NumPy scalar with our original dtype, but it doesn't work
that way. NumPy automatically “promotes” the array-type scalar into a full-fledged array
of the base type. This is convenient, but it's another case where
dset[…].dtype !=
dset.dtype
.