Hardware Reference
In-Depth Information
34.3.1 Introducing More Asynchrony in the File System
Asynchronous I/O (AIO) is a method of allowing computation and I/O to
be performed simultaneously, potentially improving the utilization of a sys-
tem's resources and reducing the execution time of a task. AIO is most eec-
tive when provided with hardware support, application design, and operating
system support. This is often the case in stand-alone workstations, but many
HPC applications running at large scales are coupled, which causes resource
contention to affect the eciency of the application. AIO directly incurs net-
work trac while the application is executing, which can introduce delays in
completing synchronizing actions like halo exchanges, collectives, etc.
Another, perhaps more significant, issue with using AIO in HPC applica-
tions is memory pressure. A bulk synchronous checkpoint is typically going
to write a signicant proportion of the application's memory to disk. Appli-
cations often run more eciently when using more memory, but AIO requires
holding the checkpoint's buers until they have been moved o-node. This
extra memory use has the potential to reduce the application's overall eec-
tiveness to the point that AIO would cause the application to run slower.
One possible solution to these problems is to add asynchrony to the por-
tions of the parallel file system running outside of the compute node, in the
form of off-node buffer caches. There are at least two major projects that
aim to implement this concept: the Burst Buffer project, and the Sirocco file
system.
34.3.1.1 The Burst Buer
The burst buffer (see Chapter 23) may be the most widely discussed hard-
ware I/O accelerator within the storage system research community. The
burst buffer is a high-bandwidth, flash-based storage server that is placed
between the compute nodes and disk-based storage servers, acting as an ini-
tial destination for data being written to disk. Because flash has a much
higher write bandwidth than disk, an application can checkpoint quickly to
the burst buffer, then continue computing. The burst buffer can then orches-
trate a slow data migration to disk-based storage before the next checkpoint
interval, creating a storage system that can satisfy bursty write workloads like
bulk synchronous checkpointing. Previous estimates, through anticipated ca-
pacities and bandwidths, show that an exascale storage system that supports
checkpoint restart would only consume approximately 6.6% of an anticipated
exascale power budget [13].
34.3.1.2
Sirocco: A File System for Heterogeneous Media
Sirocco [12] is a file system that is inspired by peer-to-peer systems. It
is intended for pre-exascale machines and beyond. The core of its design is
based on a small number of fundamental ideas about how a client and a server
should behave in a scalable system. These include:
 
Search WWH ::




Custom Search