Databases Reference
In-Depth Information
CHAPTER 3
Working with Datasets
Datasets are the central feature of HDF5. You can think of them as NumPy arrays that
live on disk. Every dataset in HDF5 has a name, a type, and a shape, and supports random
access. Unlike the built-in np.save and friends, there's no need to read and write the
entire array as a block; you can use the standard NumPy syntax for slicing to read and
write just the parts you want.
Dataset Basics
First, let's create a file so we have somewhere to store our datasets:
>>> f = h5py . File ( "testfile.hdf5" )
Every dataset in an HDF5 file has a name. Let's see what happens if we just assign a new
NumPy array to a name in the file:
>>> arr = np . ones (( 5 , 2 ))
>>> f [ "my dataset" ] = arr
>>> dset = f [ "my dataset" ]
>>> dset
<HDF5 dataset "my dataset": shape (5, 2), type "<f8">
We put in a NumPy array but got back something else: an instance of the class
h5py.Dataset . This is a “proxy” object that lets you read and write to the underlying
HDF5 dataset on disk.
Type and Shape
Let's explore the Dataset object. If you're using IPython, type dset. and hit Tab to see
the object's attributes; otherwise, do dir(dset) . There are a lot, but a few stand out:
>>> dset . dtype
dtype('float64')
Search WWH ::




Custom Search