Databases Reference
In-Depth Information
CHAPTER 9
Concurrency: Parallel HDF5, Threading,
and Multiprocessing
Over the past 10 years or so, parallel code has become crucial to scientific programming.
Nearly every modern computer has at least two cores; dedicated workstations are readily
available with 12 or more. Plunging hardware prices have made 100-core clusters fea‐
sible even for small research groups.
As a rapid development language, and one with easy access to C and FORTRAN libra‐
ries, Python is increasingly being used as a top-level “glue” language for such platforms.
Scientific programs written in Python can leverage existing “heavy-lifting” libraries
written in C or FORTRAN using any number of mechanisms, from ctypes to Cython
to the built-in NumPy routines.
This chapter discusses the various mechanisms in Python for writing parallel code, and
how they interact with HDF5.
Python Parallel Basics
Broadly speaking, there are three ways to do concurrent programming in Python:
threads, the multiprocessing module, and finally by using bindings for the Message
Passing Interface (MPI).
Thread-based code is fine for GUIs and applications that call into external libraries that
don't tie up the Python interpreter. As we'll see in a moment, you can't use more than
one core's worth of time when running a pure-Python program. There's also no per‐
formance advantage on the HDF5 side to using threads, since the HDF5 library serializes
all calls.
multiprocessing is a more recent built-in module available with Python, which pro‐
vides support for basic fork() -based parallel processing. The main restriction is that
 
Search WWH ::




Custom Search