Hardware Reference
In-Depth Information
15.2 History and Background
In 2002, when Argonne National Laboratory and Northwestern University
started the Parallel-NetCDF project, the climate community had for nearly
a decade prior been using the serial netCDF package [6] from UCAR. Serial
netCDF provided climate scientists half of what they needed: the library and
file format had at its foundation the kinds of multi-dimensional arrays of typed
data that naturally fit with the kinds of simulations climate scientists carry
out.
The missing half of serial netCDF was how to access these datasets in
parallel. At the time, simulations faced two unappealing choices. Either they
could do \le-per-process" I/O, producing one netCDF le for each parallel
process, or they could send all data to a master process and have that process
do all I/O. An \N-to-N" I/O model, where N processes operate on N (or
more) files, quickly poses challenges to the underlying file system as it tries
to deal with thousands of files. Writing N-to-N is far simpler than reading N
files and re-assembling the simulation state. Sending a collection of N files to
a collaborator also poses challenges. A far better solution would be to just
operate on one file.
Sending all data to a master rank to manage one file poses two challenges.
First, the master process needs enough memory to hold data from the other
parallel processors. Second, the master process quickly becomes a critical re-
source, preventing all other processes from making progress. In an era where
thousand-way parallelism is routine, an approach that serializes access to one
processor may certainly be possible, but will result in unacceptable bottle-
necks.
At the time, the only other application-oriented I/O library was HDF5.
Like netCDF, HDF5 provided (and continues to provide) a data model and
API well-suited to multi-dimensional arrays of typed data. (See Chapter 16
for more information about HDF5.) The HDF5 API and model differs signifi-
cantly from netCDF's API and le format. Those HDF5 dierences allow for
many powerful features, but make some optimization more dicult. A par-
allel version of netCDF offered a chance to explore parallel I/O in a more
constrained context.
15.3 Design and Architecture
The Parallel-NetCDF design should look familiar to anyone familiar with
serial netCDF. Having existed a decade prior to Parallel-NetCDF, serial
netCDF had already established data files and codes. Nothing about the
 
Search WWH ::




Custom Search