Hardware Reference
In-Depth Information
the metanode. On the other hand, the common cases of reader-only read-
ers, writer-only writers, or reads and updates within an existing file with no
file size changes, can run concurrently without any inode lock conflicts. For
applications that do not require accurate mtime, GPFS oers a \stat-lite"
option that propagates mtime changes asynchronously and handles stat()
calls without revoking inode tokens. This option is the default for access time
(atime). Furthermore, in the common case of a single writer, the node writing
to the file becomes the metanode for the file, and therefore incurs no additional
overhead for sending metadata updates over the network.
When writing a new file or extending an existing file, each node indepen-
dently allocates disk space for the data blocks it writes. For this purpose,
byte-range tokens are rounded up to block boundaries, so only one node allo-
cates storage for any particular data block. The block allocation map, which
records the allocation status (free or in-use) of all disk blocks in the file system,
is divided into a large, fixed number of separately lockable regions; and each
region contains the allocation status of a fraction of the disk blocks on every
disk in the file system. Hence, access to a single region with enough free space
is sucient for a node to properly stripe the files it writes across all disks. One
of the nodes in the cluster acts as the allocation manager, which collects free
space statistics about all allocation regions and provides hints about which re-
gion to try whenever a node runs out of disk space in the region it is currently
using. To the extent possible, the allocation manager prevents lock conflicts
between nodes by directing different nodes to different regions.
9.2.3.3
Concurrent Directory Updates
To support ecient lookups in very large directories, GPFS uses extendible
hashing [5] to organize entries within a directory. The directory block that
contains the entry for a particular file name can be found by hashing the
name and using the n low-order bits of the hash value as the block number,
where n depends on the size of the directory.
Handling directory updates in a cluster eciently poses a challenge, be-
cause write sharing for directories is more common and much finer grained
than for regular files. Each file create or delete operation updates a single
entry in a directory block that can hold thousands of entries, and there is
little locality because hashing randomizes the placement of directory entries.
Hence, when concurrent directory updates are detected, GPFS switches to a
finer-grained locking mode, where a directory operation locks the hash value
of the file name being inserted or deleted instead of the directory block being
updated. This allows creates and deletes of different files to proceed concur-
rently, even if the associated directory entries fall in the same directory block,
while still properly synchronizing concurrent operations on the same file name.
The metanode then collects the directory updates from multiple nodes and
writes the modified directory blocks to disk.
The global lock manager handling the lock tokens for a directory monitors
access patterns through the token requests it receives and dynamically adjusts
 
Search WWH ::




Custom Search