Databases Reference
In-Depth Information
1. Do all your file I/O in the main process, but don't have files open when you invoke
the
multiprocessing
features.
2. Multiple subprocesses can safely
read
from the same file, but
only open it once the
new process has been created.
3. Have each subprocess
write
to a different file, and merge them when finished.
Figure 9-2
shows workflow (1). The initial process is responsible for file I/O, and com‐
municates with the subprocesses through queues and other
multiprocessing
constructs.
Figure 9-2. Multiprocessing-based approach to using HDF5
One mechanism for “Pythonic” parallel computation is to use “process pools” that dis‐
tribute the work among worker processes. These are instances of
multiprocess
ing.Pool
, which among other things have a parallel equivalent of the built-in
map()
:
>>>
import
string
>>>
from
multiprocessing
import
Pool
>>>
p
=
Pool
(
2
)
# Create a 2-process pool
>>>
words_in
=
[
'hello'
,
'some'
,
'words'
]
>>>
words_out
=
p
.
map
(
string
.
upper
,
words_in
)
>>>
print
words_out
['HELLO', 'SOME', 'WORDS']
Here's an example of using HDF5 with
Pool
. Suppose we had a file containing a 1D
dataset of coordinate pairs, and we wanted to compute their distance from the origin.