GPFS - High Performance Parallel I/O

Hardware Reference

In-Depth Information

the metanode. On the other hand, the common cases of reader-only read-

ers, writer-only writers, or reads and updates within an existing file with no

file size changes, can run concurrently without any inode lock conflicts. For

applications that do not require accurate mtime, GPFS oers a \stat-lite"

option that propagates mtime changes asynchronously and handles stat()

calls without revoking inode tokens. This option is the default for access time

(atime). Furthermore, in the common case of a single writer, the node writing

to the file becomes the metanode for the file, and therefore incurs no additional

overhead for sending metadata updates over the network.

When writing a new file or extending an existing file, each node indepen-

dently allocates disk space for the data blocks it writes. For this purpose,

byte-range tokens are rounded up to block boundaries, so only one node allo-

cates storage for any particular data block. The block allocation map, which

records the allocation status (free or in-use) of all disk blocks in the file system,

is divided into a large, fixed number of separately lockable regions; and each

region contains the allocation status of a fraction of the disk blocks on every

disk in the file system. Hence, access to a single region with enough free space

is sucient for a node to properly stripe the files it writes across all disks. One

of the nodes in the cluster acts as the allocation manager, which collects free

space statistics about all allocation regions and provides hints about which re-

gion to try whenever a node runs out of disk space in the region it is currently

using. To the extent possible, the allocation manager prevents lock conflicts

between nodes by directing different nodes to different regions.

9.2.3.3

Concurrent Directory Updates

To support ecient lookups in very large directories, GPFS uses extendible

hashing [5] to organize entries within a directory. The directory block that

contains the entry for a particular file name can be found by hashing the

name and using the n low-order bits of the hash value as the block number,

where n depends on the size of the directory.

Handling directory updates in a cluster eciently poses a challenge, be-

cause write sharing for directories is more common and much finer grained

than for regular files. Each file create or delete operation updates a single

entry in a directory block that can hold thousands of entries, and there is

little locality because hashing randomizes the placement of directory entries.

Hence, when concurrent directory updates are detected, GPFS switches to a

finer-grained locking mode, where a directory operation locks the hash value

of the file name being inserted or deleted instead of the directory block being

updated. This allows creates and deletes of different files to proceed concur-

rently, even if the associated directory entries fall in the same directory block,

while still properly synchronizing concurrent operations on the same file name.

The metanode then collects the directory updates from multiple nodes and

writes the modified directory blocks to disk.

The global lock manager handling the lock tokens for a directory monitors

access patterns through the token requests it receives and dynamically adjusts

High Performance Parallel I/O

Search WWH ::

Custom Search

Home