Storing Metadata with Attributes - Python and HDF5

Databases Reference

In-Depth Information

Type Guessing

When you create a dataset, you generally specify the data type you want by providing a

NumPy dtype object. There are exceptions; for example, you can get a single-

precision float by omitting the dtype when calling create_dataset . But every dataset

has an explicit dtype, and you can always discover what it is via the .dtype property:

>>> dset . dtype

dtype('float32')

In contrast, with attributes h5py generally hides the type from you. It's important to

remember that there is a definite type in the HDF5 file. The dictionary-style interface

to attributes just means that it's usually inferred from what you provide.

Let's flush our file to disk with:

>>> f . flush ()

and look at it with h5ls :

$ h5ls -vlr attrsdemo.hdf5

Opened "attrsdemo.hdf5" with sec2 driver.

/ Group

Location: 1:96

Links: 1

/dataset Dataset {100/100}

Attribute: run_id scalar

Type: native int

Data: 144

Attribute: sample_rate scalar

Type: native double

Data: 1e+08

Attribute: title scalar

Type: variable-length null-terminated ASCII string

Data: "Dataset from third round of experiments"

Location: 1:800

Links: 1

Storage: 400 logical bytes, 0 allocated bytes

Type: native float

In most cases, the type is determined by simply passing the value to np.array and then

storing the resulting object. For integers on 32-bit systems you would get a 32-bit (“na‐

tive”) integer:

>>> np . array ( 144 ) . dtype

dtype('int32')

This explains the “native int” type for run_id .

You're not limited to scalar values, by the way. There's no problem storing whole NumPy

arrays in the file:

Search WWH ::

Custom Search

Home