Hardware Reference
In-Depth Information
community have pointed to a number of viable alternative models. This section
will discuss a few, more relevant examples.
The IBM Virtual Storage Access Method (VSAM) model [15], defined in
the 1970s, provides a number of features that would be compelling in an HPC
system. Data is stored as records of potentially variable length with multiple
fields. Data items can be referenced with a key, with a record number, or the
file can be directly accessed with byte offsets. Multiple dataset organizations
are provided to cater to specific use cases.
While most users do not realize it, the Microsoft New Technology File
System (NTFS) also provides an interesting alternative model in the form of
alternative data streams [5]. This functionality allows for multiple streams of
data to be associated with the same file name. A default data stream holds
standard \POSIX-style" data, while a colon notation is used to dene and
access additional named streams under the same file name.
This model of multiple streams associated with a single file name is not
unique to NTFS, and in fact the approach has appeared in HPC parallel file
systems research as well. The Galley parallel file system [16], developed in
the 1990s, supported a concept of subfiles. In their model, a set of subfiles
were created at the time a file was created that mapped to underlying storage
devices. These subfiles then contained a set of forks where each could hold
an array of bytes (like a normal POSIX file). The authors showed how upper
software layers could map astronomical data into this organization.
The Vesta parallel file system [8] was developed at IBM in the 1990s specif-
ically for HPC. Vesta exposes a 2D structure for files, with physical partitions
holding sequences of records. Physical partitions are similar to subfiles in the
Galley model and are meant to map to storage nodes. This provides a notion
of parallelism of access that has been adopted by current research in the area.
30.3.2 Object Abstractions in HPC
Work in object-based file systems set the stage for one possible alternative:
providing direct access to storage objects. Researchers are investigating how
to expose an object abstraction while maintaining the existing namespace
abstraction. In this model, a directory entry refers to a collection of objects,
each individually accessible.
The \End of Files" (EOF) [11] project is one such example. Goodell et al.
developed a prototype atop PVFS [7] that allows for a static set of objects
to be associated with a directory entry. Conceptually, this is best thought of
as the file system no longer owning the distribution of data into objects, but
rather delegating this to higher-level software layers. This approach exposes
the natural unit of concurrency (i.e., the object) and provides multiple data
streams that may be used by upper layers for organizational purposes.
Figure 30.2 shows how the PnetCDF (Chapter 15) library maps netCDF
datasets to a POSIX file (left) or to the EOF object model (right). In the
POSIX file mapping, PnetCDF lays out variables across the single file byte
 
Search WWH ::




Custom Search