PLFS: Software-Dened Storage for HPC - High Performance Parallel I/O

Hardware Reference

In-Depth Information

FIGURE 14.1 (See color insert): PLFS shared file mode transforms multiple

writes to a shared file into streams of data sent to multiple subfiles on the

underlying storage system(s). Not shown is the internal PLFS metadata used

to reconstruct the file. [Image courtesy of John Bent (EMC).]

many applications naturally have partitions of a large distributed data struc-

ture which are poorly matched to the block alignment of many storage systems

and therefore lose performance to various locks and serialization bottlenecks

inherent in parallel file systems.

By decoupling the concurrent writes, PLFS sends data streams to the un-

derlying storage systems, which avoids these locks and bottlenecks. The basic

mechanism is that PLFS first creates a PLFS container and then stores all

the individual subfiles in this container as well as the metadata necessary to

re-create the logical file. Functionally, the container is very similar to how

inodes are used in almost all file systems since the Berkeley Fast File Sys-

tem [7]. When the user requests data from the file, PLFS consults the meta-

data within the container to resolve which subfile(s) contain the requested

data and then reads from the subfile(s) to return the data to the reading

application.

Note that at no point is the application aware of this transformation:

all operations on the shared file work functionally, exactly the same as if

PLFS was not present. One concern in PLFS, however, is that the amount of

PLFS metadata can grow to challenging sizes; this concern is addressed by

discovering hidden structure within seemingly unstructured I/O [6].

Search WWH ::

Custom Search

Home