Databases Reference
In-Depth Information
Looking at the file in
h5ls
:
$ h5ls -vlr attrs_create.hdf5
Opened "attrs_create.hdf5" with sec2 driver.
/ Group
Location: 1:96
Links: 1
/dataset Dataset {100/100}
Attribute: two_byte_int scalar
Type: native short
Data: 190
Location: 1:800
Links: 1
Storage: 400 logical bytes, 0 allocated bytes
Type: native float
This is a great way to make sure you get the right
flavor
of string. Unlike scalar strings,
by default when you provide an array-like object of strings, they get sent through NumPy
and end up as fixed-length strings in the file:
>>>
dset
.
attrs
[
'strings'
]
=
[
"Hello"
,
"Another string"
]
>>>
dset
.
attrs
[
'strings'
]
array(['Hello', 'Another string'],
dtype='|S14')
In contrast, if you specify the “variable-length string” special dtype (see
Chapter 7
):
>>>
dt
=
h5py
.
special_dtype
(
vlen
=
str
)
>>>
dset
.
attrs
.
create
(
'more_strings'
,
[
"Hello"
,
"Another string"
],
dtype
=
dt
)
>>>
dset
.
attrs
[
'more_strings'
]
array([Hello, Another string], dtype=object)
Looking at the file, the two attributes have subtly different storage techniques. The
original attribute is stored as a pair of 14-byte fixed-length strings, while the other is
stored as a pair of variable-length strings:
$ h5ls -vlr attrs_create.hdf5
Opened "attrs_create.hdf5" with sec2 driver.
/ Group
Location: 1:96
Links: 1
/dataset Dataset {100/100}
Attribute: more_strings {2}
Type: variable-length null-terminated ASCII string
Data: "Hello", "Another string"
Attribute: strings {2}
Type: 14-byte null-padded ASCII string
Data: "Hello" '\000' repeats 8 times, "Another string"
Attribute: two_byte_int scalar
Type: native short
Data: 190
Location: 1:800
Links: 1