GPFS - High Performance Parallel I/O

Hardware Reference

In-Depth Information

updates that affect file system consistency in a recovery log. There is a sep-

arate log for each node stored on the shared disks. When GPFS detects a

node failure, using its internal heartbeat mechanism, a different cluster node

reads and re-applies updates recorded in the failed node's log before locks that

were held by the failed node are released. This guarantees that any metadata

updated by the failed node is quickly restored to a consistent state and can

then be accessed again by other nodes.

To protect against data loss or the unavailability of data due to failures in

the disk subsystem, GPFS provides two options: use of RAID-based storage

controllers together with redundant paths to disk, or replication at the file

system level. As an alternative to traditional RAID controllers, GPFS also

offers an advanced software RAID implementation integrated into the NSD

server called GPFS Native RAID (GNR) [7, 8]. If file system replication is

chosen, GPFS allocates and writes two or more copies of each data block

and/or metadata object.

To avoid data being unavailable due to maintenance, GPFS supports on-

line system management. This includes the ability to grow or shrink a file

system by adding or removing disks and optionally rebalancing data and

metadata in response to disk configuration changes while the file system is

mounted. System software, including GPFS, can also be upgraded one node

at a time without ever taking down the whole cluster.

9.2.3

Distributed Locking and Metadata Management

9.2.3.1

The Distributed Lock Manager

The GPFS distributed lock manager uses a collection of global lock man-

agers running on a designated subset of nodes in the cluster, in conjunction

with local lock managers in each file system node. For each file, directory, or

other file system object, a hash of the object ID is used to select one of the

global lock manager nodes to coordinate distributed locks for that object by

handing out lock tokens. Once a node has obtained a token from the global

lock manager responsible for the object, subsequent operations accessing the

object on that node can lock the object locally, without requiring additional

network communication. Additional network communication is only necessary

when an operation on another node requires a conflicting lock on the same

object.

Lock tokens also serve as the mechanism for maintaining cache consistency

between nodes. A \read-only" token may be shared among nodes and allows

each token holder to cache objects it has read from disk. An \exclusive-write"

token may only be held by one node at a time and allows the node to modify

the object in its cache. When a write token is revoked or downgraded to

read-only mode, GPFS first waits for local locks to be released and commits

local changes to disk before allowing the token to be granted to another node.

This serializes reads and writes to support the POSIX semantic that ensures

High Performance Parallel I/O

Search WWH ::

Custom Search

Home