Databases Reference
In-Depth Information
comm
=
MPI
.
COMM_WORLD
rank
=
comm
.
rank
f
=
h5py
.
File
(
'collective_test.hdf5'
,
'w'
,
driver
=
'mpio'
,
comm
=
comm
)
# RIGHT: All processes participate when creating an object
dset
=
f
.
create_dataset
(
'x'
,
(
100
,),
'i'
)
# WRONG: Only one process participating in a metadata operation
if
rank
==
0
:
dset
.
attrs
[
'title'
]
=
"Hello"
# RIGHT: Data I/O can be independent
if
rank
==
0
:
dset
[
0
]
=
42
# WRONG: All processes must participate in the same order
if
rank
==
0
:
f
.
attrs
[
'a'
]
=
10
f
.
attrs
[
'b'
]
=
20
else
:
f
.
attrs
[
'b'
]
=
20
f
.
attrs
[
'a'
]
=
10
When you violate this requirement, generally you
won't
get an exception; instead, var‐
ious Bad Things will happen behind the scenes, possibly endangering your data.
Note that “collective” does
not
mean “synchronized.” Although all processes in the pre‐
ceding example call
create_dataset
, for example, they don't pause until the others
catch up. The only requirements are that every process has to make the call, and in the
same order.
Atomicity Gotchas
Sometimes, it's necessary to synchronize the state of multiple processes. For example,
you might want to ensure that the first stage of a distributed calculation is finished before
moving on to the next part. MPI provides a number of mechanisms to deal with this.
The simplest is called “barrier synchronization”—from the Python side, this is simply
a function called
Barrier
that blocks until every process has reached the same point in
the program.
Here's an example. This program generally prints “A” and “B” statements out of order:
from
random
import
random
from
time
import
sleep
from
mpi4py
import
MPI
comm
=
MPI
.
COMM_WORLD
rank
=
comm
.
rank