Working with Datasets - Python and HDF5

Databases Reference

In-Depth Information

>>> dset = f . create_dataset ( 'unlimited' , ( 2 , 2 ), maxshape = ( 2 , None ))

>>> dset . shape

(2, 2)

>>> dset . maxshape

(2, None)

>>> dset . resize (( 2 , 3 ))

>>> dset . shape

(2, 3)

>>> dset . resize (( 2 , 2 ** 30 ))

>>> dset . shape

(2, 1073741824)

You can mark as many axes as you want as unlimited.

Finally, no matter what you put in maxshape , you can't change the total number of axes.

This value, the rank of the dataset, is fixed and can never be changed:

>>> dset . resize (( 2 , 2 , 2 ))

TypeError: New shape length (3) must match dataset rank (2)

Data Shuffling with resize

NumPy has a set of rules that apply when you change the shape of a dataset. For example,

take a simple four-element square array with shape (2, 2):

>>> a = np . array ([ [ 1 , 2 ], [ 3 , 4 ] ])

>>> a . shape

(2, 2)

>>> print a

[[1, 2]

[3, 4]]

If we now resize it to (1, 4), keeping the total number of elements unchanged, the values

are still there but rearrange themselves:

>>> a . resize (( 1 , 4 ))

>>> print a

[[1, 2, 3, 4]]

And finally if we resize it to (1, 10), adding new elements, the new ones are initialized

to zero:

>>> a . resize (( 1 , 10 ))

>>> print a

[[1 2 3 4 0 0 0 0 0 0]]

If you've reshaped NumPy arrays before, you're likely used to this reshuffling behavior.

HDF5 has a different approach. No reshuffling is ever performed. Let's create a Data

set object to experiment on, which has both axes set to unlimited:

>>> dset = f . create_dataset ( 'sizetest' , ( 2 , 2 ), dtype = np . int32 , maxshape = ( None ,

None))

>>> dset [ ... ] = [ [ 1 , 2 ], [ 3 , 4 ] ]

Search WWH ::

Custom Search

Home