Databases Reference
In-Depth Information
Each dataset has a fixed type that is defined when it's created and can never be changed.
HDF5 has a vast, expressive type mechanism that can easily handle the built-in NumPy
types, with few exceptions. For this reason, h5py always expresses the type of a dataset
using standard NumPy
dtype
objects.
There's another familiar attribute:
>>>
dset
.
shape
(5, 2)
A dataset's shape is also defined when it's created, although as we'll see later, it can be
changed. Like NumPy arrays, HDF5 datasets can have between zero axes (scalar, shape
()
) and 32 axes. Dataset axes can be up to 2
63
-1 elements long.
Reading and Writing
Datasets wouldn't be that useful if we couldn't get at the underlying data. First, let's see
what happens if we just read the entire dataset:
>>>
out
=
dset
[
...
]
>>>
out
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])
>>>
type
(
out
)
<type 'numpy.ndarray'>
Slicing into a
Dataset
object returns a NumPy array. Keep in mind what's actually hap‐
pening when you do this: h5py translates your slicing selection into a portion of the
dataset and has HDF5 read the data from disk. In other words, ignoring caching, a slicing
operation results in a read or write to disk.
Let's try updating just a portion of the dataset:
>>>
dset
[
1
:
4
,
1
]
=
2.0
>>>
dset
[
...
]
array([[ 1., 1.],
[ 1., 2.],
[ 1., 2.],
[ 1., 2.],
[ 1., 1.]])
Success!