Organizing Data with References, Types, and Dimension Scales - Python and HDF5

Databases Reference

In-Depth Information

attribute), it's created with a fixed data type. Suppose you have multiple data products

in a file (for example, many datasets containing image data), and you want to be sure

each has exactly the same type.

HDF5 provides a native way to ensure this, by allowing you to save a data type to the

file independently of any particular dataset or attribute. When you call create_data

set , you supply the stored type and HDF5 will “link” the type to the brand new dataset.

The Datatype Object

You can create such an independent, or “named” type, by simply assigning a NumPy

dtype to a name in the file:

>>> f [ 'mytype' ] = np . dtype ( 'float32' )

When we open the named type, we don't get a dtype back, but something else:

>>> out = f [ 'mytype' ]

>>> out

Like the Dataset object, this h5py.Datatype object is a thin proxy that allows access to

the underlying HDF5 datatype. The most immediately obvious property is Data

type.dtype , which returns the equivalent NumPy dtype object:

>>> out . dtype

dtype('float32')

Since they're full-fledged objects in the file, you have a lot of other properties as well:

>>> out . name

u'/mytype'

>>> out . parent

Also available are .file ( h5py.File instance containing the type), .ref (object refer‐

ence to the type), and attributes, just like Dataset and Group objects:

>>> out . attrs [ 'info' ] = "This is an attribute on a named type object"

In the HDF5 world, for technical reasons named types are now called

committed types . You may hear both terms; for our purposes, they

mean the same thing.

Linking to Named Types

It's simple to create a dataset or attribute that refers to a named type object; just supply

the Datatype instance as the dtype:

>>> dset = f . create_dataset ( "typedemo" , ( 100 ,), dtype = f [ 'mytype' ])

Search WWH ::

Custom Search

Home