Storing Metadata with Attributes - Python and HDF5

Databases Reference

In-Depth Information

string as an attribute. That created a variable-length ASCII string ( “Variable-Length

Strings” on page 89 ).

In contrast, an instance of np.string_ would get stored as a fixed-length string in the

file:

>>> dset . attrs [ 'title_fixed' ] = np . string_ ( "Another title" )

This generally isn't an issue, but some older FORTRAN-based programs can't deal with

variable-length strings. If this is a problem for your application, use np.string_ , or

equivalently, arrays of NumPy type S .

By the way, you can also store Unicode strings in the file. They're written out with the

HDF5-approved UTF-8 encoding:

>>> dset . attrs [ 'Yet another title' ] = u'String with accent ( \u00E9 )'

>>> f . flush ()

Here's what the file looks like now, with our fixed-length and Unicode strings inside:

$ h5ls -vlr attrsdemo.hdf5/dataset

Opened "attrsdemo.hdf5" with sec2 driver.

dataset Dataset {100/100}

Attribute: Yet\ another\ title scalar

Type: variable-length null-terminated UTF-8 string

Data: "String with accent (\37777777703\37777777651)"

Attribute: ones scalar

Type: object reference

Data: DATASET-1:70568

Attribute: run_id scalar

Type: native int

Data: 144

Attribute: sample_rate scalar

Type: native double

Data: 1e+08

Attribute: title scalar

Type: variable-length null-terminated ASCII string

Data: "Dataset from third round of experiments"

Attribute: title_fixed scalar

Type: 13-byte null-padded ASCII string

Data: "Another title"

Location: 1:800

Links: 1

Storage: 400 logical bytes, 0 allocated bytes

Type: native float

There is one more thing to mention about strings, and it has to do with the strict sep‐

aration in Python 3 between byte strings and text strings.

When you read an attribute from a file, you generally get an object with the same type

as in HDF5. So if we were to store a NumPy int32, we would get an int32 back.

Search WWH ::

Custom Search

Home