Databases Reference
In-Depth Information
Each dataset has a fixed type that is defined when it's created and can never be changed.
HDF5 has a vast, expressive type mechanism that can easily handle the built-in NumPy
types, with few exceptions. For this reason, h5py always expresses the type of a dataset
using standard NumPy dtype objects.
There's another familiar attribute:
>>> dset . shape
(5, 2)
A dataset's shape is also defined when it's created, although as we'll see later, it can be
changed. Like NumPy arrays, HDF5 datasets can have between zero axes (scalar, shape
() ) and 32 axes. Dataset axes can be up to 2 63 -1 elements long.
Reading and Writing
Datasets wouldn't be that useful if we couldn't get at the underlying data. First, let's see
what happens if we just read the entire dataset:
>>> out = dset [ ... ]
>>> out
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])
>>> type ( out )
<type 'numpy.ndarray'>
Slicing into a Dataset object returns a NumPy array. Keep in mind what's actually hap‐
pening when you do this: h5py translates your slicing selection into a portion of the
dataset and has HDF5 read the data from disk. In other words, ignoring caching, a slicing
operation results in a read or write to disk.
Let's try updating just a portion of the dataset:
>>> dset [ 1 : 4 , 1 ] = 2.0
>>> dset [ ... ]
array([[ 1., 1.],
[ 1., 2.],
[ 1., 2.],
[ 1., 2.],
[ 1., 1.]])
Success!
Search WWH ::




Custom Search