GPFS - High Performance Parallel I/O

Hardware Reference

In-Depth Information

to handle component failures by automatically redirecting NFS clients to an-

other CNFS node. In all, these functions provide a scalable, highly available

NFS solution.

9.2.2 Design Overview

This section provides a high-level overview of the methods GPFS uses to

achieve highly scalable performance and high availability while supporting

standard POSIX file system APIs.

The GPFS approach to scalability is to distribute everything|data and

metadata|as evenly as possible across all available resources. This is called

wide striping. Large files are divided into large, equal-sized blocks, and consec-

utive blocks are placed on different disks in a round-robin fashion. The block

size is configurable and can be as large as 16 MB in order to take advantage

of higher sequential data rates of individual disks or storage controller LUNs.

Different nodes may read and write different parts of a large file concurrently.

This allows an application to make full use of the I/O bandwidth of the un-

derlying disk subsystem and interconnect, even when accessing only a single

large file. Aggregating large numbers of physical disks into a single file system

allows file system capacity and I/O bandwidth to scale with the cluster size.

GPFS optimizes the organization of data within and across blocks. For

space eciency, GPFS stores small files, as well as the data at the end of

a large file, as fragments; which are allocated by dividing a full block into

several smaller subblocks. With the GPFS file placement optimizer (FPO),

the blocks are laid out to take advantage of storage and network topology [8].

For example, when FPO is deployed using the SNC model, data blocks of a file

can be grouped into larger chunks, and each chunk is stored on disks attached

to the same node. Analytic applications can then distribute their computation

across the cluster so most data is read from local disks, thereby minimizing

data transfer over the network [2].

GPFS uses distributed locking to synchronize access to data and metadata

on a shared disk. This protects the consistency of file system structures in

the presence of concurrent updates from multiple nodes and provides single

system image semantics without a centralized server handling all metadata

updates. It also allows each node to cache the data and metadata being ac-

cessed on that node while maintaining cache consistency between nodes. That

means non-shared workloads can run with near-local file system performance

because each node can independently read data and metadata of the files

it accesses, cache data locally, and write updates directly back to disk. For

shared workloads, GPFS optimizes locking granularity and metadata update

algorithms to minimize interactions between nodes, so shared data can be

read and written at maximum I/O bandwidth speeds.

For high availability, GPFS must allow uninterrupted file system access

in the presence of node failures, disk failures, and system maintenance oper-

ations. Similar to other journaling file systems, GPFS records all metadata

High Performance Parallel I/O

Search WWH ::

Custom Search

Home