Databases Reference
In-Depth Information
with
h5py
.
File
(
'atomicdemo.hdf5'
,
'w'
,
driver
=
'mpio'
,
comm
=
comm
)
as
f
:
dset
=
f
.
create_dataset
(
'x'
,
(
1
,),
dtype
=
'i'
)
if
rank
==
0
:
dset
[
0
]
=
42
comm
.
Barrier
()
if
rank
==
1
:
print
dset
[
0
]
If you answered “42,” you're
wrong
. You might get 42, and you might get 0. This is one
of the most irritating things about MPI from a consistency standpoint. The default write
semantics do not guarantee that writes will have completed before
Barrier
returns and
the program moves on. Why? Performance. Since MPI is typically used for huge,
thousand-processor problems, people are willing to put up with relaxed consistency
requirements to get every last bit of speed possible.
Starting with HDF5 1.8.9, there is a feature to get around this. You can enable MPI
“atomic” mode for your file. This turns on a low-level feature that trades performance
for strict consistency requirements. Among other things, it means that
Barrier
(and
other MPI synchronization methods) interact with writes the way you expect. This
modified program will always print “42”:
import
h5py
from
mpi4py
import
MPI
comm
=
MPI
.
COMM_WORLD
rank
=
comm
.
rank
with
h5py
.
File
(
'atomicdemo.hdf5'
,
'w'
,
driver
=
'mpio'
,
comm
=
comm
)
as
f
:
f
.
atomic
=
True
# Enables strict atomic mode (requires HDF5 1.8.9+)
dset
=
f
.
create_dataset
(
'x'
,
(
1
,),
dtype
=
'i'
)
if
rank
==
0
:
dset
[
0
]
=
42
comm
.
Barrier
()
if
rank
==
1
:
print
dset
[
0
]
The trade-off, of course, is reduced performance. Generally the best solution is to avoid
passing data from process to process through the file. MPI has great interprocess com‐
munication tools. Use them!