Databases Reference
In-Depth Information
string as an attribute. That created a
variable-length
ASCII string (
“Variable-Length
Strings” on page 89
).
In contrast, an instance of
np.string_
would get stored as a
fixed-length
string in the
file:
>>>
dset
.
attrs
[
'title_fixed'
]
=
np
.
string_
(
"Another title"
)
This generally isn't an issue, but some older FORTRAN-based programs can't deal with
variable-length strings. If this is a problem for your application, use
np.string_
, or
equivalently, arrays of NumPy type
S
.
By the way, you can also store Unicode strings in the file. They're written out with the
HDF5-approved UTF-8 encoding:
>>>
dset
.
attrs
[
'Yet another title'
]
=
u'String with accent (
\u00E9
)'
>>>
f
.
flush
()
Here's what the file looks like now, with our fixed-length and Unicode strings inside:
$ h5ls -vlr attrsdemo.hdf5/dataset
Opened "attrsdemo.hdf5" with sec2 driver.
dataset Dataset {100/100}
Attribute: Yet\ another\ title scalar
Type: variable-length null-terminated UTF-8 string
Data: "String with accent (\37777777703\37777777651)"
Attribute: ones scalar
Type: object reference
Data: DATASET-1:70568
Attribute: run_id scalar
Type: native int
Data: 144
Attribute: sample_rate scalar
Type: native double
Data: 1e+08
Attribute: title scalar
Type: variable-length null-terminated ASCII string
Data: "Dataset from third round of experiments"
Attribute: title_fixed scalar
Type: 13-byte null-padded ASCII string
Data: "Another title"
Location: 1:800
Links: 1
Storage: 400 logical bytes, 0 allocated bytes
Type: native float
There is one more thing to mention about strings, and it has to do with the strict sep‐
aration in Python 3 between
byte
strings and
text
strings.
When you read an attribute from a file, you generally get an object with the same type
as in HDF5. So if we were to store a NumPy int32, we would get an int32 back.