Databases Reference
In-Depth Information
But if we rename the dataset, this quickly breaks:
>>>
f
.
move
(
'mydata'
,
'mydata2'
)
>>>
out
=
f
[
grp1
.
attrs
[
'dataset'
]]
KeyError: "unable to open object"
Using object references instead, we have:
>>>
grp1
.
attrs
[
'dataset'
]
=
dset
.
ref
>>>
grp1
.
attrs
[
'dataset'
]
<HDF5 object reference>
>>>
out
=
f
[
grp1
.
attrs
[
'dataset'
]]
>>>
out
==
dset
True
Moving the dataset yet again, the reference still resolves:
>>>
f
.
move
(
'mydata2'
,
'mydata3'
)
>>>
out
=
f
[
grp1
.
attrs
[
'dataset'
]]
>>>
out
==
dset
True
When you open an object by dereferencing, every now and then it's
possible that HDF5 won't be able to figure out the object's name. In
that case,
obj.name
will return
None
. It's less of a problem than it used
to be (HDF5 1.8 has gotten very good at figuring out names), but don't
be alarmed if you happen to get
None
.
References as Data
References are full-fledged types in HDF5; you can freely use them in both attributes
and datasets. Obviously there's no native type in NumPy for references, so we once again
call on
special_dtype
for help, this time with the
ref
keyword:
>>>
dt
=
h5py
.
special_dtype
(
ref
=
h5py
.
Reference
)
>>>
dt
dtype(('|O4', [(({'type': <type 'h5py.h5r.Reference'>}, 'ref'), '|O4')]))
That's a lot of metadata. But don't be dismayed; just like variable-length strings, this is
otherwise a regular object dtype:
>>>
dt
.
kind
'O'
We can easily create datasets of
Reference
type:
>>>
ref_dset
=
f
.
create_dataset
(
"references"
,
(
10
,),
dtype
=
dt
)
What's in such a dataset? If we retrieve an uninitialized element, we get a zeroed or “null”
reference: