Databases Reference
In-Depth Information
with h5py . File ( 'result_index_ %d .hdf5' % idx , 'w' ) as f :
f [ 'result' ] = result
# Create out pool and carry out the computation
p = Pool ( 4 )
p . map ( distance_block , xrange ( 0 , 1000 , 100 ))
with h5py . File ( 'coords.hdf5' ) as f :
dset = f . create_dataset ( 'distances' , ( 1000 ,), dtype = 'f4' )
# Loop over our 100-element "chunks" and merge the data into coords.hdf5
for idx in xrange ( 0 , 1000 , 100 ):
filename = 'result_index_ %d .hdf5' % idx
with h5py . File ( filename , 'r' ) as f2 :
data = f2 [ 'result' ][ ... ]
dset [ idx : idx + 100 ] = data
os . unlink ( filename ) # no longer needed
That looks positively exhausting, mainly because of the limitations on passing open files
to child processes. What if there were a way to share a single file between processes,
automatically synchronizing reads and writes? It turns out there is: Parallel HDF5.
MPI and Parallel HDF5
Figure 9-3 shows how an application works using Parallel HDF5, in contrast to the
threading and multiprocessing approaches earlier. MPI-based applications work by
launching multiple parallel instances of the Python interpreter. Those instances com‐
municate with each other via the MPI library. The key difference compared to multi
processing is that the processes are peers , unlike the child processes used for the Pool
objects we saw earlier.
Search WWH ::




Custom Search