Concurrency: Parallel HDF5, Threading, and Multiprocessing - Python and HDF5

Databases Reference

In-Depth Information

1. Do all your file I/O in the main process, but don't have files open when you invoke

the multiprocessing features.

2. Multiple subprocesses can safely read from the same file, but only open it once the

new process has been created.

3. Have each subprocess write to a different file, and merge them when finished.

Figure 9-2 shows workflow (1). The initial process is responsible for file I/O, and com‐

municates with the subprocesses through queues and other multiprocessing

constructs.

Figure 9-2. Multiprocessing-based approach to using HDF5

One mechanism for “Pythonic” parallel computation is to use “process pools” that dis‐

tribute the work among worker processes. These are instances of multiprocess

ing.Pool , which among other things have a parallel equivalent of the built-in map() :

>>> import string

>>> from multiprocessing import Pool

>>> p = Pool ( 2 ) # Create a 2-process pool

>>> words_in = [ 'hello' , 'some' , 'words' ]

>>> words_out = p . map ( string . upper , words_in )

>>> print words_out

['HELLO', 'SOME', 'WORDS']

Here's an example of using HDF5 with Pool . Suppose we had a file containing a 1D

dataset of coordinate pairs, and we wanted to compute their distance from the origin.

Search WWH ::

Custom Search

Home