Databases Reference
In-Depth Information
with
h5py
.
File
(
'result_index_
%d
.hdf5'
%
idx
,
'w'
)
as
f
:
f
[
'result'
]
=
result
# Create out pool and carry out the computation
p
=
Pool
(
4
)
p
.
map
(
distance_block
,
xrange
(
0
,
1000
,
100
))
with
h5py
.
File
(
'coords.hdf5'
)
as
f
:
dset
=
f
.
create_dataset
(
'distances'
,
(
1000
,),
dtype
=
'f4'
)
# Loop over our 100-element "chunks" and merge the data into coords.hdf5
for
idx
in
xrange
(
0
,
1000
,
100
):
filename
=
'result_index_
%d
.hdf5'
%
idx
with
h5py
.
File
(
filename
,
'r'
)
as
f2
:
data
=
f2
[
'result'
][
...
]
dset
[
idx
:
idx
+
100
]
=
data
os
.
unlink
(
filename
)
# no longer needed
That looks positively exhausting, mainly because of the limitations on passing open files
to child processes. What if there were a way to share a single file between processes,
automatically synchronizing reads and writes? It turns out there is: Parallel HDF5.
MPI and Parallel HDF5
Figure 9-3
shows how an application works using Parallel HDF5, in contrast to the
threading
and
multiprocessing
approaches earlier. MPI-based applications work by
launching multiple parallel instances of the Python interpreter. Those instances com‐
municate with each other via the MPI library. The key difference compared to
multi
processing
is that the processes are
peers
, unlike the child processes used for the
Pool
objects we saw earlier.