Databases Reference
In-Depth Information
The h5py package (and PyTables) implement a few additional types on top of this sys‐
tem. Table 7-2 lists additions made by h5py that are described in this chapter.
Table 7-2. Additional Python-side types
Python type
NumPy expression
Stored as
Boolean
HDF5 enum with FALSE=0, TRUE=1
np.dtype("bool")
Complex
HDF5 compound with fields r and i
np.dtype("complex")
Integers and Floats
HDF5 supports all the NumPy integer sizes (1 byte to 8 bytes), signed and unsigned,
little-endian and big-endian. Keep in mind that the default behavior for HDF5 when
storing a too-large value in a too-small dataset is to clip , not to “roll over” like some
versions of NumPy:
>>> f = h5py . File ( "typesdemo.hdf5" )
>>> dset = f . create_dataset ( 'smallint' , ( 10 ,), dtype = np . int8 )
>>> dset [ 0 ] = 300
>>> dset [ 0 ]
127
>>> a = np . zeros (( 10 ,), dtype = np . int8 )
>>> a [ 0 ] = 300
>>> a [ 0 ]
-44
For floating-point numbers, HDF5 supports both single- and double-precision floats
(4 and 8 bytes respectively) out of the box.
The HDF5 type representation system is very powerful, and among other things it can
represent unusual floating-point precisions. “Half-precision” floats are an interesting
case. These tiny 2-byte floats, available in NumPy as float16 , are used for storage in
applications like image and video processing, since they consume only half the space of
the equivalent single-precision float. They're great where precision isn't that important
and more dynamic range is needed than a 16-bit integer can provide.
>>> dset = f . create_dataset ( 'half_float' , ( 100 , 100 , 100 ), dtype = np . float16 )
Keep in mind this is a storage format only; trying to do math on half-precision floats in
NumPy will require casting and therefore be slow. Use Dataset.read_direct , the Da
taset.astype context manager, or simply convert them after reading:
>>> a = dset [ ... ]
>>> a = a . astype ( np . float32 )
But if you have values roughly between 10 -8 and 60,000, and aren't too bothered about
precision, they're a great way to save disk space.
 
Search WWH ::




Custom Search