Databases Reference
In-Depth Information
Type Guessing
When you create a dataset, you generally specify the data type you want by providing a
NumPy dtype object. There are exceptions; for example, you can get a single-
precision float by omitting the dtype when calling
create_dataset
. But every dataset
has an explicit dtype, and you can always discover what it is via the
.dtype
property:
>>>
dset
.
dtype
dtype('float32')
In contrast, with attributes h5py generally hides the type from you. It's important to
remember that there
is
a definite type in the HDF5 file. The dictionary-style interface
to attributes just means that it's usually inferred from what you provide.
Let's flush our file to disk with:
>>>
f
.
flush
()
and look at it with
h5ls
:
$ h5ls -vlr attrsdemo.hdf5
Opened "attrsdemo.hdf5" with sec2 driver.
/ Group
Location: 1:96
Links: 1
/dataset Dataset {100/100}
Attribute: run_id scalar
Type: native int
Data: 144
Attribute: sample_rate scalar
Type: native double
Data: 1e+08
Attribute: title scalar
Type: variable-length null-terminated ASCII string
Data: "Dataset from third round of experiments"
Location: 1:800
Links: 1
Storage: 400 logical bytes, 0 allocated bytes
Type: native float
In most cases, the type is determined by simply passing the value to
np.array
and then
storing the resulting object. For integers on 32-bit systems you would get a 32-bit (“na‐
tive”) integer:
>>>
np
.
array
(
144
)
.
dtype
dtype('int32')
This explains the “native int” type for
run_id
.
You're not limited to scalar values, by the way. There's no problem storing whole NumPy
arrays in the file: