Concurrency: Parallel HDF5, Threading, and Multiprocessing - Python and HDF5

Databases Reference

In-Depth Information

with h5py . File ( 'atomicdemo.hdf5' , 'w' , driver = 'mpio' , comm = comm ) as f :

dset = f . create_dataset ( 'x' , ( 1 ,), dtype = 'i' )

if rank == 0 :

dset [ 0 ] = 42

comm . Barrier ()

if rank == 1 :

print dset [ 0 ]

If you answered “42,” you're wrong . You might get 42, and you might get 0. This is one

of the most irritating things about MPI from a consistency standpoint. The default write

semantics do not guarantee that writes will have completed before Barrier returns and

the program moves on. Why? Performance. Since MPI is typically used for huge,

thousand-processor problems, people are willing to put up with relaxed consistency

requirements to get every last bit of speed possible.

Starting with HDF5 1.8.9, there is a feature to get around this. You can enable MPI

“atomic” mode for your file. This turns on a low-level feature that trades performance

for strict consistency requirements. Among other things, it means that Barrier (and

other MPI synchronization methods) interact with writes the way you expect. This

modified program will always print “42”:

import h5py

from mpi4py import MPI

comm = MPI . COMM_WORLD

rank = comm . rank

with h5py . File ( 'atomicdemo.hdf5' , 'w' , driver = 'mpio' , comm = comm ) as f :

f . atomic = True # Enables strict atomic mode (requires HDF5 1.8.9+)

dset = f . create_dataset ( 'x' , ( 1 ,), dtype = 'i' )

if rank == 0 :

dset [ 0 ] = 42

comm . Barrier ()

if rank == 1 :

print dset [ 0 ]

The trade-off, of course, is reduced performance. Generally the best solution is to avoid

passing data from process to process through the file. MPI has great interprocess com‐

munication tools. Use them!

Search WWH ::

Custom Search

Home