Databases Reference
In-Depth Information
>>>
dset
=
f
.
create_dataset
(
'unlimited'
,
(
2
,
2
),
maxshape
=
(
2
,
None
))
>>>
dset
.
shape
(2, 2)
>>>
dset
.
maxshape
(2, None)
>>>
dset
.
resize
((
2
,
3
))
>>>
dset
.
shape
(2, 3)
>>>
dset
.
resize
((
2
,
2
**
30
))
>>>
dset
.
shape
(2, 1073741824)
You can mark as many axes as you want as unlimited.
Finally, no matter what you put in
maxshape
, you can't change the total number of axes.
This value, the
rank
of the dataset, is fixed and can never be changed:
>>>
dset
.
resize
((
2
,
2
,
2
))
TypeError: New shape length (3) must match dataset rank (2)
Data Shuffling with resize
NumPy has a set of rules that apply when you change the shape of a dataset. For example,
take a simple four-element square array with shape (2, 2):
>>>
a
=
np
.
array
([
[
1
,
2
],
[
3
,
4
]
])
>>>
a
.
shape
(2, 2)
>>>
print
a
[[1, 2]
[3, 4]]
If we now resize it to (1, 4), keeping the total number of elements unchanged, the values
are still there but rearrange themselves:
>>>
a
.
resize
((
1
,
4
))
>>>
print
a
[[1, 2, 3, 4]]
And finally if we resize it to (1, 10), adding new elements, the new ones are initialized
to zero:
>>>
a
.
resize
((
1
,
10
))
>>>
print
a
[[1 2 3 4 0 0 0 0 0 0]]
If you've reshaped NumPy arrays before, you're likely used to this
reshuffling
behavior.
HDF5 has a different approach. No reshuffling is ever performed. Let's create a
Data
set
object to experiment on, which has both axes set to unlimited:
>>>
dset
=
f
.
create_dataset
(
'sizetest'
,
(
2
,
2
),
dtype
=
np
.
int32
,
maxshape
=
(
None
,
None))
>>>
dset
[
...
]
=
[
[
1
,
2
],
[
3
,
4
]
]