Working with Datasets - Python and HDF5

Databases Reference

In-Depth Information

Each dataset has a fixed type that is defined when it's created and can never be changed.

HDF5 has a vast, expressive type mechanism that can easily handle the built-in NumPy

types, with few exceptions. For this reason, h5py always expresses the type of a dataset

using standard NumPy dtype objects.

There's another familiar attribute:

>>> dset . shape

(5, 2)

A dataset's shape is also defined when it's created, although as we'll see later, it can be

changed. Like NumPy arrays, HDF5 datasets can have between zero axes (scalar, shape

() ) and 32 axes. Dataset axes can be up to 2 63 -1 elements long.

Reading and Writing

Datasets wouldn't be that useful if we couldn't get at the underlying data. First, let's see

what happens if we just read the entire dataset:

>>> out = dset [ ... ]

>>> out

array([[ 1., 1.],

[ 1., 1.],

[ 1., 1.]])

>>> type ( out )

Slicing into a Dataset object returns a NumPy array. Keep in mind what's actually hap‐

pening when you do this: h5py translates your slicing selection into a portion of the

dataset and has HDF5 read the data from disk. In other words, ignoring caching, a slicing

operation results in a read or write to disk.

Let's try updating just a portion of the dataset:

>>> dset [ 1 : 4 , 1 ] = 2.0

>>> dset [ ... ]

array([[ 1., 1.],

[ 1., 2.],

[ 1., 1.]])

Success!

Search WWH ::

Custom Search

Home