Storing Metadata with Attributes - Python and HDF5

Databases Reference

In-Depth Information

Looking at the file in h5ls :

$ h5ls -vlr attrs_create.hdf5

Opened "attrs_create.hdf5" with sec2 driver.

/ Group

Location: 1:96

Links: 1

/dataset Dataset {100/100}

Attribute: two_byte_int scalar

Type: native short

Data: 190

Location: 1:800

Links: 1

Storage: 400 logical bytes, 0 allocated bytes

Type: native float

This is a great way to make sure you get the right flavor of string. Unlike scalar strings,

by default when you provide an array-like object of strings, they get sent through NumPy

and end up as fixed-length strings in the file:

>>> dset . attrs [ 'strings' ] = [ "Hello" , "Another string" ]

>>> dset . attrs [ 'strings' ]

array(['Hello', 'Another string'],

dtype='|S14')

In contrast, if you specify the “variable-length string” special dtype (see Chapter 7 ):

>>> dt = h5py . special_dtype ( vlen = str )

>>> dset . attrs . create ( 'more_strings' , [ "Hello" , "Another string" ], dtype = dt )

>>> dset . attrs [ 'more_strings' ]

array([Hello, Another string], dtype=object)

Looking at the file, the two attributes have subtly different storage techniques. The

original attribute is stored as a pair of 14-byte fixed-length strings, while the other is

stored as a pair of variable-length strings:

$ h5ls -vlr attrs_create.hdf5

Opened "attrs_create.hdf5" with sec2 driver.

/ Group

Location: 1:96

Links: 1

/dataset Dataset {100/100}

Attribute: more_strings {2}

Type: variable-length null-terminated ASCII string

Data: "Hello", "Another string"

Attribute: strings {2}

Type: 14-byte null-padded ASCII string

Data: "Hello" '\000' repeats 8 times, "Another string"

Attribute: two_byte_int scalar

Type: native short

Data: 190

Location: 1:800

Links: 1

Python and HDF5

Search WWH ::

Custom Search

Home