Databases Reference
In-Depth Information
Finally, here are some tips to keep in mind when using HDF5's automatic type conver‐
sion. They apply both to reads with read_direct or astype and also to writing data
from NumPy into existing datasets:
1. Generally, you can only convert between types of the same “flavor.” For example,
you can convert integers to floats, and floats to other floats, but not strings to floats
or integers. You'll get an unhelpful-looking IOError if you try.
2. When you're converting to a “smaller” type ( float64 to float32 , or "S10" to
"S5" ), HDF5 will truncate or “clip” the values:
>>> f . create_dataset ( 'x' , data = 1e256 , dtype = np . float64 )
>>> print f [ 'x' ][ ... ]
1e+256
>>> f . create_dataset ( 'y' , data = 1e256 , dtype = np . float32 )
>>> print f [ 'y' ][ ... ]
inf
There's no warning when this happens, so it's in your interest to keep track of the types
involved.
Reshaping an Existing Array
There's one more trick up our sleeve with create_dataset , although this one's a little
more esoteric. You'll recall that it takes a “shape” argument as well as a dtype argument.
As long as the total number of elements match, you can specify a shape different from
the shape of your input array.
Let's suppose we have an array that stores 100 640×480-pixel images, stored as 640-
element “scanlines”:
>>> imagedata . shape
(100, 480, 640)
Now suppose that we want to store each image as a “top half ” and “bottom half ” without
needing to do the slicing when we read. When we go to create the dataset, we simply
specify the new shape:
>>> f . create_dataset ( 'newshape' , data = imagedata , shape = ( 100 , 2 , 240 , 640 ))
There's no performance penalty. Like the built-in np.reshape , only the indices are
shuffled around.
Fill Values
If you create a brand-new dataset, you'll notice that by default it's zero filled:
>>> dset = f . create_dataset ( 'empty' , ( 2 , 2 ), dtype = np . int32 )
>>> dset [ ... ]
Search WWH ::




Custom Search