Database Reference
In-Depth Information
network), is novel in the way it scales. 10 Panasas storage shelves are added one
“lane” at a time. A lane is a collection of shelves sharing a (redundant) lane
switch that is connected to a subset of the I/O nodes in each cluster, with
a much lower bandwidth route between shelves for Panasas internal trac.
Most of a cluster's I/O nodes are not routed to each lane, so that the stor-
age switches do not have to get larger as the number of lanes increase. Thus,
routing to a storage device is done primarily in the compute nodes when they
select an I/O node to route through. In contrast, many supercomputers have
a full bisection bandwidth storage network between I/O nodes and storage
devices so that compute node to I/O node routing can be based only on the
compute node address. With this arrangement, however, the storage network
may become very expensive as the storage scales. Additionally, PaScalBB I/O
nodes are redundant and load balanced by using standard IP routing software
in the Linux client software and Panasas devices.
2.4 Interfacing with Applications
Computational science applications are complex software systems that operate
in terms of their own data models. When the time comes for these applica-
tions to interact with the storage system, these internal data models must be
mapped onto structures that the storage system understands. Additional soft-
ware layers—an “I/O software stack”—are layered on top of the underlying
storage hardware. The interfaces that these layers provide help applications
use the available storage eciently and easily.
The I/O software stacks being deployed today consist of three layers. At the
lowest layer, parallel file systems maintain a global name space and provide
e cient and coherent access to data. These file systems present a POSIX
or POSIX-like interface 11 for applications as well as richer interfaces for use
by upper layers. The second layer is the I/O middleware layer. This layer
is responsible for mapping I/O into the programming model being used by
the application, with MPI-IO 12 being a good example. The top layer is the
high-level I/O library layer, with software such as Parallel netCDF 13 and
HDF5 14 providing structured interfaces and data formats to help bridge the
gap between application data models and the storage system.
In this section we discuss some of the common patterns of access in ap-
plications and then cover the interfaces that the I/O software stack makes
available for applications to interact with underlying storage.
2.4.1 Access Patterns
As a result of the number of processes involved and the kinds of data being
stored, computational science applications have access patterns that in many
Search WWH ::




Custom Search