More About Types - Python and HDF5

Databases Reference

In-Depth Information

Depending on your version of h5py, you may see a different result

when you print the dtype; the details of how the “special” data is at‐

tached vary. Don't depend on any specific implementation. Always use

the special_dtype function and don't try to piece one together your‐

self.

Working with vlen String Datasets

You can use a “special” dtype to create an array in the normal fashion. Here we create a

100-element variable-length string dataset:

>>> dset = f . create_dataset ( 'vlen_dataset' , ( 100 ,), dtype = dt )

You can write strings into it from anything that looks “string-shaped,” including ordi‐

nary Python strings and fixed-length NumPy strings:

>>> dset [ 0 ] = "Hello"

>>> dset [ 1 ] = np . string_ ( "Hello2" )

>>> dset [ 3 ] = "X" * 10000

Retrieving a single element, you get a Python string:

>>> out = dset [ 0 ]

>>> type ( out )

str

Retrieving more than one, you get an object array full of Python strings:

>>> dset [ 0 : 2 ]

array([Hello, Hello2], dtype=object)

There's one caveat here: for technical reasons, the array returned has a plain-vanilla

“object” dtype, not the fancy dtype we created from h5py.special_dtype :

>>> out = dset [ 0 : 1 ]

>>> out . dtype

dtype('object')

This is one of very few cases where dset[...].dtype != dset.dtype .

Byte Versus Unicode Strings

The preceding examples, like the rest of this topic, are written assuming you are using

Python 2. However, in both Python 2 and 3 there exist two “flavors” of string you should

be aware of. They are stored in the file slightly differently, and this has implications for

both internationalized applications and data portability.

A complete discussion of the bytes/Unicode mess in Python is beyond the scope of this

topic. However, it's important to discuss how the two types interact with HDF5.

Search WWH ::

Custom Search

Home