Databases Reference
In-Depth Information
u'Accent: \xe9'
>>> print dset [ 1 ]
Accent: é
When you create this kind of a dataset, the underlying HDF5 character set is set to
“UTF-8.” The only disadvantage is that some older third-party applications, like IDL,
may not be able to read your strings. If compatibility with legacy code like this is essential
for your application, make sure you test!
Remember the default string on Python 3, str , is actually a Unicode
string. So on Python 3, h5py.special_dtype(vlen=str) will give you
a UTF-8 dataset, not the compatible-with-everything ASCII dataset.
Use vlen=bytes instead to get an ASCII dataset.
Don't Store Binary Data in Strings!
Finally, note that HDF5 will allow you to store raw binary data using the “ASCII” dataset
dtype created with special_dtype(vlen=bytes) . This may work, but is generally con‐
sidered evil. And because of how the strings are handled internally, if your binary string
has NULLs in it ( "\x00" ), it will be silently truncated!
The best way to store raw binary data is with the “opaque” type (see “Opaque Types” on
page 98 ).
Future-Proofing Your Python 2 Application
Finally, here are some simple rules you can follow to keep the bytes/Unicode mess from
driving you mad. They will also help you when porting to Python 3, using the context-
free translation tool 2to3 that ships with Python.
1. Keep the text-versus-bytes distinction clear in your mind, and cleanly separate the
two in code.
2. Always use the alias bytes instead of str when you're sure you want a byte string.
For literals, you can even use the “b” prefix, for example, b"Hello" . In particular,
when calling special_dtype to create a byte string, always use bytes .
3. For text strings use str , or better yet, unicode . Unicode literals are entered with a
leading “u”: u"Hello" .
Compound Types
For some kinds of data, it makes sense to bundle closely related values together into a
single element. The classic example is a C struct : multiple pieces of data that are handled
Search WWH ::




Custom Search