Database Reference
In-Depth Information
2.1 From Disk Arrays to Parallel Data Storage
The preceding chapter covered how disk arrays can combine many disks into
a unit that has a high-aggregate input/output (I/O) performance. This tech-
nique enables the construction of high-performance file systems that are ac-
cessible from single systems. When combined with a networked file system
such as NFS and high-performance networking hardware, many client nodes
may have access to this resource, enabling I/O to a shared storage system
from a parallel application.
However, disk arrays are not the end of the story. Vectoring all I/O oper-
ations through a single server can create a serious bottleneck because I/O is
limited by the bandwidth between the network and the server and the commu-
nication paths internal to the server. To attain the aggregate bandwidths re-
quired in today's parallel systems, we need to eliminate this bottleneck as well.
Parallel data storage systems combine multiple network links and storage
components with software that organizes this hardware into a single, coher-
ent file system accessible by all the nodes in a parallel system. The software
responsible for performing this organization is commonly called a parallel file
system . Figure 2.1 illustrates a parallel file system. On the left we see a sim-
ple directory hierarchy, with one astrophysics checkpoint file and two protein
sequence files stored in different directories. On the right we see how this
directory structure is mapped onto hardware. I/O servers (IOSs) store com-
ponents of the file system, including directories and pieces of files. Distribution
of data across IOSs is managed by the parallel file system, and the distribu-
tion policy is often user tunable. In this example, data is split into stripes
that are referenced by handles (e.g., H01, H02). The large checkpoint file has
been split across four servers to enable greater concurrency, while the smaller
C
PFS
CCCC
/pfs
PFS
PFS
PFS
PFS
/astro
/bio
Comm. Network
H01
H02
H03
H05
H06
IOS
IOS
IOS
IOS
prot17.seq
prot04.seq
H01
H05
/astro
H04
H02
H04
H03
/pfs
H06
/bio
chkpt32.nc
Figure 2.1 In a parallel storage system, data is distributed across multiple
I/O servers (IOSs), allowing multiple data paths to be used concurrently to
enable high throughput. Clients access this storage via parallel file system
(PFS) software that drives communication over an interconnection network.
Search WWH ::




Custom Search