Databases Reference
In-Depth Information
Depending on your version of h5py, you may see a different result
when you print the dtype; the details of how the “special” data is at‐
tached vary. Don't depend on any specific implementation. Always use
the special_dtype function and don't try to piece one together your‐
self.
Working with vlen String Datasets
You can use a “special” dtype to create an array in the normal fashion. Here we create a
100-element variable-length string dataset:
>>> dset = f . create_dataset ( 'vlen_dataset' , ( 100 ,), dtype = dt )
You can write strings into it from anything that looks “string-shaped,” including ordi‐
nary Python strings and fixed-length NumPy strings:
>>> dset [ 0 ] = "Hello"
>>> dset [ 1 ] = np . string_ ( "Hello2" )
>>> dset [ 3 ] = "X" * 10000
Retrieving a single element, you get a Python string:
>>> out = dset [ 0 ]
>>> type ( out )
str
Retrieving more than one, you get an object array full of Python strings:
>>> dset [ 0 : 2 ]
array([Hello, Hello2], dtype=object)
There's one caveat here: for technical reasons, the array returned has a plain-vanilla
“object” dtype, not the fancy dtype we created from h5py.special_dtype :
>>> out = dset [ 0 : 1 ]
>>> out . dtype
dtype('object')
This is one of very few cases where dset[...].dtype != dset.dtype .
Byte Versus Unicode Strings
The preceding examples, like the rest of this topic, are written assuming you are using
Python 2. However, in both Python 2 and 3 there exist two “flavors” of string you should
be aware of. They are stored in the file slightly differently, and this has implications for
both internationalized applications and data portability.
A complete discussion of the bytes/Unicode mess in Python is beyond the scope of this
topic. However, it's important to discuss how the two types interact with HDF5.
Search WWH ::




Custom Search