Livermore Computing Center - High Performance Parallel I/O

Hardware Reference

In-Depth Information

5.7.3 SILO: LLNL's I/O Library

At LLNL, SILO [10] is the local I/O middleware most heavily used for

application checkpointing (also known as \restart dumps") and plot dumps.

Performance of SILO in these scenarios tends to be dominated by the size

and mix of I/O requests. Performance is best when I/O requests are few

in number and as large as practical. Unfortunately, the SILO library itself

as well as other libraries, manage tiny bits of metadata \in band" with the

application's raw (or bulk) data. This has a profoundly negative impact when

run on file systems that operate on large page sizes and/or buffer pages in

multiple hardware resources between the application and the file system.

Preliminary analyses of I/O requests that are sent out the bottom of the

SILO-HDF5 I/O stack to the file system indicated that tiny bits of library

metadata requests (less than 1 KB in size) accounted for more than 90% of

all requests but less than 5% of all data. Once this picture came into focus,

the solution also became obvious. The approach was to split metadata and

raw data streams of the applications and buffer both streams in large, file-

system-friendly blocks. A new HDF5 called the \SILO Block-Based VFD" was

developed for this purpose.

The VFD breaks a file into blocks of an application-specified size (with a

default megabit) and then keeps some maximum number of blocks cached in

memory at any one time (with a default of 32). In addition, I/O requests are

tagged as being composed primarily of the application's raw (or bulk) data, or

primarily of library metadata, and then targeted for different metadata or raw

data blocks accordingly. A least recently used (LRU) policy is employed to

determine which cache blocks to pre-empt when the maximum cached block

count is reached, and which metadata blocks are favored to keep in the cache

over raw data blocks.

The result was that the application's interface to SILO and its interface to

HDF5 could be left unchanged but I/O request patterns sent out the bottom of

the I/O stack were made significantly more file-system-friendly. Performance

for some applications was improved by 50 or more.

5.7.4 Scalable Checkpoint/Restart

Another Exascale-focused effort to improve application I/O performance is

the Scalable Checkpoint/Restart library (SCR) [16], which may be described

as distributed caching in the application layer. The SCR multilevel system can

store checkpoints to a compute node's local memory, to its random access or

flash memory, or even its disk, in addition to the parallel file system. Regular

checkpoints can be saved quickly to local memory and duplicated on other

nodes. If one node fails, its data can be restored from a duplicate node. With

this technique, the parallel file system is accessed much less frequently.

SCR stores, or caches, only the most recent checkpoints, discarding an

older one as each new checkpoint is saved. It can also apply a redundancy

Search WWH ::

Custom Search

Home