Databases Reference
In-Depth Information
The h5py package (and PyTables) implement a few additional types on top of this sys‐
tem.
Table 7-2
lists additions made by h5py that are described in this chapter.
Table 7-2. Additional Python-side types
Python type
NumPy expression
Stored as
Boolean
HDF5 enum with FALSE=0, TRUE=1
np.dtype("bool")
Complex
HDF5 compound with fields
r
and
i
np.dtype("complex")
Integers and Floats
HDF5 supports all the NumPy integer sizes (1 byte to 8 bytes), signed and unsigned,
little-endian and big-endian. Keep in mind that the default behavior for HDF5 when
storing a too-large value in a too-small dataset is to
clip
, not to “roll over” like some
versions of NumPy:
>>>
f
=
h5py
.
File
(
"typesdemo.hdf5"
)
>>>
dset
=
f
.
create_dataset
(
'smallint'
,
(
10
,),
dtype
=
np
.
int8
)
>>>
dset
[
0
]
=
300
>>>
dset
[
0
]
127
>>>
a
=
np
.
zeros
((
10
,),
dtype
=
np
.
int8
)
>>>
a
[
0
]
=
300
>>>
a
[
0
]
-44
For floating-point numbers, HDF5 supports both single- and double-precision floats
(4 and 8 bytes respectively) out of the box.
The HDF5 type representation system is very powerful, and among other things it can
represent unusual floating-point precisions. “Half-precision” floats are an interesting
case. These tiny 2-byte floats, available in NumPy as
float16
, are used for storage in
applications like image and video processing, since they consume only half the space of
the equivalent single-precision float. They're great where precision isn't that important
and more dynamic range is needed than a 16-bit integer can provide.
>>>
dset
=
f
.
create_dataset
(
'half_float'
,
(
100
,
100
,
100
),
dtype
=
np
.
float16
)
Keep in mind this is a storage format only; trying to do math on half-precision floats in
NumPy will require casting and therefore be slow. Use
Dataset.read_direct
, the
Da
taset.astype
context manager, or simply convert them after reading:
>>>
a
=
dset
[
...
]
>>>
a
=
a
.
astype
(
np
.
float32
)
But if you have values roughly between 10
-8
and 60,000, and aren't too bothered about
precision, they're a great way to save disk space.