Hardware Reference
In-Depth Information
to handle component failures by automatically redirecting NFS clients to an-
other CNFS node. In all, these functions provide a scalable, highly available
NFS solution.
9.2.2 Design Overview
This section provides a high-level overview of the methods GPFS uses to
achieve highly scalable performance and high availability while supporting
standard POSIX file system APIs.
The GPFS approach to scalability is to distribute everything|data and
metadata|as evenly as possible across all available resources. This is called
wide striping. Large files are divided into large, equal-sized blocks, and consec-
utive blocks are placed on different disks in a round-robin fashion. The block
size is configurable and can be as large as 16 MB in order to take advantage
of higher sequential data rates of individual disks or storage controller LUNs.
Different nodes may read and write different parts of a large file concurrently.
This allows an application to make full use of the I/O bandwidth of the un-
derlying disk subsystem and interconnect, even when accessing only a single
large file. Aggregating large numbers of physical disks into a single file system
allows file system capacity and I/O bandwidth to scale with the cluster size.
GPFS optimizes the organization of data within and across blocks. For
space eciency, GPFS stores small files, as well as the data at the end of
a large file, as fragments; which are allocated by dividing a full block into
several smaller subblocks. With the GPFS file placement optimizer (FPO),
the blocks are laid out to take advantage of storage and network topology [8].
For example, when FPO is deployed using the SNC model, data blocks of a file
can be grouped into larger chunks, and each chunk is stored on disks attached
to the same node. Analytic applications can then distribute their computation
across the cluster so most data is read from local disks, thereby minimizing
data transfer over the network [2].
GPFS uses distributed locking to synchronize access to data and metadata
on a shared disk. This protects the consistency of file system structures in
the presence of concurrent updates from multiple nodes and provides single
system image semantics without a centralized server handling all metadata
updates. It also allows each node to cache the data and metadata being ac-
cessed on that node while maintaining cache consistency between nodes. That
means non-shared workloads can run with near-local file system performance
because each node can independently read data and metadata of the files
it accesses, cache data locally, and write updates directly back to disk. For
shared workloads, GPFS optimizes locking granularity and metadata update
algorithms to minimize interactions between nodes, so shared data can be
read and written at maximum I/O bandwidth speeds.
For high availability, GPFS must allow uninterrupted file system access
in the presence of node failures, disk failures, and system maintenance oper-
ations. Similar to other journaling file systems, GPFS records all metadata
 
Search WWH ::




Custom Search