Organizing Data with References, Types, and Dimension Scales - Python and HDF5

Databases Reference

In-Depth Information

>>> ref_out = dset . regionref [ 10 : 90 ]

>>> ref_out

Like object references, region references are generally opaque. The only useful aspects

are the shape of the dataspace (the same as the parent dataset), and the shape of your

selection:

>>> dset . regionref . shape ( ref_out )( 100 ,)

>>> dset . regionref . selection ( ref_out )( 80 ,)

This represents the shape of your selection; in other words, if you had applied your

slicing arguments directly to the dataset, it's the shape of the array that would have been

returned from HDF5.

Once you've got a region reference, you can use it directly as a slicing argument to

retrieve data from the dataset:

>>> data = dset [ ref_out ]

>>> data

array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,

0., 0.], dtype=float32)

>>> data . shape

(80,)

Fancy Indexing

Keep in mind that if you're using “fancy” indexing methods (like Boolean arrays), then

the shape will always be one-dimensional. This mimics the behavior of NumPy for such

selections.

For example, suppose we had a little two-dimensional array, which we populated with

some random numbers:

>>> dset_random = f . create_dataset ( 'small_example' , ( 3 , 3 ))

>>> dset_random [ ... ] = np . random . random (( 3 , 3 ))

>>> dset_random [ ... ]

array([[ 0.32391435, 0.070962 , 0.57038087],

[ 0.1530778 , 0.22476801, 0.7758832 ],

[ 0.75768745, 0.73156554, 0.3228527 ]], dtype=float32)

We could create a Boolean array representing the entries greater than 0.5:

>>> index_arr = dset_random [ ... ] > 0.5

>>> index_arr

array([[False, False, True],

Search WWH ::

Custom Search

Home