Databases Reference
In-Depth Information
attribute), it's created with a fixed data type. Suppose you have multiple data products
in a file (for example, many datasets containing image data), and you want to be sure
each has exactly the same type.
HDF5 provides a native way to ensure this, by allowing you to save a data type to the
file
independently
of any particular dataset or attribute. When you call
create_data
set
, you supply the stored type and HDF5 will “link” the type to the brand new dataset.
The Datatype Object
You can create such an independent, or “named” type, by simply assigning a NumPy
dtype to a name in the file:
>>>
f
[
'mytype'
]
=
np
.
dtype
(
'float32'
)
When we open the named type, we don't get a dtype back, but something else:
>>>
out
=
f
[
'mytype'
]
>>>
out
<HDF5 named type "mytype" (dtype <f4)>
Like the
Dataset
object, this
h5py.Datatype
object is a thin proxy that allows access to
the underlying HDF5 datatype. The most immediately obvious property is
Data
type.dtype
, which returns the equivalent NumPy dtype object:
>>>
out
.
dtype
dtype('float32')
Since they're full-fledged objects in the file, you have a lot of other properties as well:
>>>
out
.
name
u'/mytype'
>>>
out
.
parent
<HDF5 group "/" (6 members)>
Also available are
.file
(
h5py.File
instance containing the type),
.ref
(object refer‐
ence to the type), and attributes, just like
Dataset
and
Group
objects:
>>>
out
.
attrs
[
'info'
]
=
"This is an attribute on a named type object"
In the HDF5 world, for technical reasons named types are now called
committed types
. You may hear both terms; for our purposes, they
mean the same thing.
Linking to Named Types
It's simple to create a dataset or attribute that refers to a named type object; just supply
the
Datatype
instance as the dtype:
>>>
dset
=
f
.
create_dataset
(
"typedemo"
,
(
100
,),
dtype
=
f
[
'mytype'
])