Hardware Reference
In-Depth Information
Chapter 23
In-Transit Processing: Data Analysis
Using Burst Buffers
Christopher Mitchell, David Bonnie, and Jonathan Woodring
Los Alamos National Laboratory
23.1
Motivation :::::::::::::::::::::::::::::::::::::::::::::::::::::::: 271
23.2
Design/Architecture :::::::::::::::::::::::::::::::::::::::::::::: 273
23.3
Systems Prototypes Related to Burst Buers ::::::::::::::::::: 274
23.4
Conclusion :::::::::::::::::::::::::::::::::::::::::::::::::::::::: 275
Bibliography :::::::::::::::::::::::::::::::::::::::::::::::::::::: 275
23.1 Motivation
With the progressive march toward ever larger and faster HPC platforms,
the HPC community is seeing a discrepancy between the aggregate bandwidth
of the compute node's capability to send data to storage and the bandwidth
available on traditional parallel file systems, utilizing hard drives, to ingest
this data. Rather than purchasing additional disks to increase bandwidth (be-
yond what is required for capacity requirements), the concept of a burst buffer
has been proposed to impedance match the compute nodes to the parallel file
system [4]. A burst buffer is an allocation of solid-state storage that is capable
of absorbing a burst of I/O activity that can then be slowly drained to a par-
allel file system while computation resumes within the running application.
The original design intent of such a system was to handle the I/O workload
commonly seen in the checkpoint/restart process, which many current HPC
applications use to handle faults. In checkpoint/restart, the application pauses
operation and then proceeds to write the state of the entire application's mem-
ory space to a formatted file on disk such that the current application state
can be reconstructed should a future failure occur. This process is periodic in
nature, but places considerable strain on the I/O subsystem for the several
seconds to minutes it takes to complete.
Since a checkpoint/restart dump contains the state of the application at a
given point in time, and placing it in a burst buffer temporarily positions the
271
 
Search WWH ::




Custom Search