Hardware Reference
In-Depth Information
5.7.3 SILO: LLNL's I/O Library
At LLNL, SILO [10] is the local I/O middleware most heavily used for
application checkpointing (also known as \restart dumps") and plot dumps.
Performance of SILO in these scenarios tends to be dominated by the size
and mix of I/O requests. Performance is best when I/O requests are few
in number and as large as practical. Unfortunately, the SILO library itself
as well as other libraries, manage tiny bits of metadata \in band" with the
application's raw (or bulk) data. This has a profoundly negative impact when
run on file systems that operate on large page sizes and/or buffer pages in
multiple hardware resources between the application and the file system.
Preliminary analyses of I/O requests that are sent out the bottom of the
SILO-HDF5 I/O stack to the file system indicated that tiny bits of library
metadata requests (less than 1 KB in size) accounted for more than 90% of
all requests but less than 5% of all data. Once this picture came into focus,
the solution also became obvious. The approach was to split metadata and
raw data streams of the applications and buffer both streams in large, file-
system-friendly blocks. A new HDF5 called the \SILO Block-Based VFD" was
developed for this purpose.
The VFD breaks a file into blocks of an application-specified size (with a
default megabit) and then keeps some maximum number of blocks cached in
memory at any one time (with a default of 32). In addition, I/O requests are
tagged as being composed primarily of the application's raw (or bulk) data, or
primarily of library metadata, and then targeted for different metadata or raw
data blocks accordingly. A least recently used (LRU) policy is employed to
determine which cache blocks to pre-empt when the maximum cached block
count is reached, and which metadata blocks are favored to keep in the cache
over raw data blocks.
The result was that the application's interface to SILO and its interface to
HDF5 could be left unchanged but I/O request patterns sent out the bottom of
the I/O stack were made significantly more file-system-friendly. Performance
for some applications was improved by 50 or more.
5.7.4 Scalable Checkpoint/Restart
Another Exascale-focused effort to improve application I/O performance is
the Scalable Checkpoint/Restart library (SCR) [16], which may be described
as distributed caching in the application layer. The SCR multilevel system can
store checkpoints to a compute node's local memory, to its random access or
flash memory, or even its disk, in addition to the parallel file system. Regular
checkpoints can be saved quickly to local memory and duplicated on other
nodes. If one node fails, its data can be restored from a duplicate node. With
this technique, the parallel file system is accessed much less frequently.
SCR stores, or caches, only the most recent checkpoints, discarding an
older one as each new checkpoint is saved. It can also apply a redundancy
 
Search WWH ::




Custom Search