Hardware Reference
In-Depth Information
as specified by user-defined policy rules. External pools allow the policy-driven
migration of rarely accessed data to archival storage.
Filesets provide a way to partition the file system namespace to allow
administrative operations at a finer granularity than that of the entire file
system. For example, GPFS allows defining user and group quotas separately
for each fileset: it allows creating snapshots of individual filesets instead of a
whole file system, and it allows placing limits on the amount of disk space
occupied by files in each fileset. Filesets also provide a convenient way to refer
to a collection of files in policy rules.
The Information Lifecycle Management (ILM) policy language supports an
SQL-like syntax for selecting files based on file attributes, such as its name,
owner, file size, and time stamps, as well as custom extended attributes set by
the user. GPFS distinguishes between placement policies, which are evaluated
at file creation time and determine initial file placement and replication at-
tributes of a file; and management policies, which are evaluated periodically or
at a user's request and allow managing les during their lifecycle. A policy rule
may change file replication; move files to another storage pool; delete files; or
invoke an arbitrary, user-specified command on the selected list of files. Data
migrated to external storage either via policy or a traditional external storage
manager is recalled on demand using the standard Data Management API
(DMAPI) [9].
9.2.4.3
Wide-Area Caching and Replication
Although it is possible to configure a GPFS cluster to span data cen-
ter boundaries or use cross-cluster mounts to share data between clusters in
different data centers, GPFS configurations work best over dedicated, high-
bandwidth, low-latency connections, typically within a data center. Active
file management (AFM) is a scalable, high performance caching layer inte-
grated into GPFS that makes it possible to share data effectively across ge-
ographically distributed sites using networks with fluctuating latencies and
even occasional outages [8, 4].
Data that reside in one cluster are cached on disk in one or more other local
or remote clusters. Once cached, data from a remote cluster can be accessed
with the same parallel I/O performance as local data and remain available
during network outages (disconnected operation). Changes are written to lo-
cal disk first then propagated asynchronously between clusters. Customizable
propagation delays and rules for resolving conflicting updates originating in
different clusters provide an eventual consistency model across clusters. Thus,
AFM trades strict POSIX consistency across clusters for high performance
access to remote data over unreliable networks.
To make optimum use of available network bandwidth, data may be
transferred in parallel between multiple nodes and over multiple connections
between clusters. For its data transport protocol, AFM can use the NSD and
NFS protocols, allowing access to data that resides in both GPFS and non-
GPFS file systems.
 
Search WWH ::




Custom Search