Databases Reference
In-Depth Information
Looking at the file in h5ls :
$ h5ls -vlr attrs_create.hdf5
Opened "attrs_create.hdf5" with sec2 driver.
/ Group
Location: 1:96
Links: 1
/dataset Dataset {100/100}
Attribute: two_byte_int scalar
Type: native short
Data: 190
Location: 1:800
Links: 1
Storage: 400 logical bytes, 0 allocated bytes
Type: native float
This is a great way to make sure you get the right flavor of string. Unlike scalar strings,
by default when you provide an array-like object of strings, they get sent through NumPy
and end up as fixed-length strings in the file:
>>> dset . attrs [ 'strings' ] = [ "Hello" , "Another string" ]
>>> dset . attrs [ 'strings' ]
array(['Hello', 'Another string'],
dtype='|S14')
In contrast, if you specify the “variable-length string” special dtype (see Chapter 7 ):
>>> dt = h5py . special_dtype ( vlen = str )
>>> dset . attrs . create ( 'more_strings' , [ "Hello" , "Another string" ], dtype = dt )
>>> dset . attrs [ 'more_strings' ]
array([Hello, Another string], dtype=object)
Looking at the file, the two attributes have subtly different storage techniques. The
original attribute is stored as a pair of 14-byte fixed-length strings, while the other is
stored as a pair of variable-length strings:
$ h5ls -vlr attrs_create.hdf5
Opened "attrs_create.hdf5" with sec2 driver.
/ Group
Location: 1:96
Links: 1
/dataset Dataset {100/100}
Attribute: more_strings {2}
Type: variable-length null-terminated ASCII string
Data: "Hello", "Another string"
Attribute: strings {2}
Type: 14-byte null-padded ASCII string
Data: "Hello" '\000' repeats 8 times, "Another string"
Attribute: two_byte_int scalar
Type: native short
Data: 190
Location: 1:800
Links: 1
Search WWH ::




Custom Search