PLFS: Software-Dened Storage for HPC - High Performance Parallel I/O

Hardware Reference

In-Depth Information

three main modes of operation (shared, flat, and small) essentially comparable

to mount time options and several interesting use cases for each mode.

14.2 Design/Architecture

PLFS is mainly designed to run as middleware on the compute nodes

themselves. It can run in the user space of applications using MPI-IO and a

patched MPI library, or in the user space of applications which are ported to

link directly into the PLFS API. The PLFS API closely mirrors the standard

POSIX API; thus porting several applications and synthetic benchmarks have

been straightforward. PLFS is also available as a FUSE file system [1] for

unmodified applications which do not use MPI-IO. Since the FUSE approach

can incur high overhead, there is also an LDPRELOAD interface which brings

PLFS into the user space of unmodified applications [8].

There are three main modes of PLFS which are set in a PLFS configuration

file and defined on a per-path basis. The options for each path define the path

that the user will use (i.e., /mnt/plfs/sharedfile ), the mode of operation,

and the underlying storage system(s) that PLFS will use for the actual storage

of the user data as well as its own metadata. Typically, the underlying storage

system is a globally visible storage system, and the PLFS configuration file is

shared across a set of compute nodes such that each compute node can write

to the same PLFS file(s) and each compute node can read PLFS files written

from a different compute node.

The three main configurations of PLFS are shared file, small file, and

flat file, each of which is intended for different application I/O workloads.

Additionally, there is a burst buffer configuration (which currently works only

in shared file mode) to transparently gain performance benefits from a smaller,

faster storage tier such as flash memory. All three modes support the ability

to use PLFS as an umbrella file system that can distribute workloads across

multiple underlying storage systems to aggregate their bandwidth and utilize

all available metadata servers. Finally, there is support in PLFS to run with

all three modes on top of cloud file systems such as Hadoop.

14.2.1 PLFS Shared File Mode

Shared file mode is the original PLFS configuration [3] and is designed

for highly concurrent writes to a shared file, such as a checkpoint file which

is simultaneously written by all processes within a large parallel application.

The architecture of PLFS shared file mode is shown in Figure 14.1. Note that

the figure shows the PLFS layer as a separate layer; this is accurate from the

perspective of the application but in fact the PLFS software runs on each

compute node. This mode was motivated by the well-known observation that

Search WWH ::

Custom Search

Home