Hardware Reference
In-Depth Information
ACLs), modification state (size, links, timestamps), open read/write/execute
state, object layout, and extended attributes. For directories, it is also possi-
ble to lock individual entries in a directory using the hash of the filename as
part of the DLM resource name, to allow concurrent lookup, insertion, and
removal of entries in a directory.
The MDT lock server will typically grant all of the IBITS for a resource
upon any enqueue request to reduce locking trac, but will not grant bits for
contended attributes unless explicitly requested. This allows clients to cache
the lookup bit to do directory traversal, but the MDS can hold the update
bit to allow file creation or deletion within the directory without contention
on the lock bits.
The MDS makes heavy use of LDLM intent locking in order to reduce con-
tention when many threads are creating files in a single directory. In the initial
lock enqueue to open and create a file, the client will include the filename, FID,
mode, permission, and other attributes with the initial lock request. The lock
server on the MDS can then execute the create request on behalf of the client
and return a lock on the FID associated with that filename (whether old or
new) instead of a lock on the parent directory. This avoids two extra network
round trips for enqueuing and canceling the parent directory lock, in favor of
a short MDS-local locking of the parent directory.
8.2.6 Object Storage Server
OSSs export one or more OSTs, each of which is stored on a single under-
lying OSD. Each OSS typically attaches to a high-capacity RAID 6 storage
array that normally provides between two and eight (or more) OSTs with tens
of TB of capacity and hundreds of MB/s of bandwidth.
Each OST operates completely independent of other OSTs, and the OSTs
are in fact totally unaware of other OSTs|there is no inter-OST commu-
nication. Since the OST object namespace is not hierarchical, there are no
dependencies of any kind between objects. This ensures that bandwidth con-
tinues to scale linearly as OSTs are added.
OST objects are identified by a FID which remains constant for each ob-
ject's lifetime. Zero-length OST objects are normally pre-created in batches
on demand by the MDS to avoid latency during create to reduce MDS-OSS
trac and to simplify transactional consistency of OST object assignment.
Upon first modification, the objects are labeled with their assigned parent
MDT inode FID (for recovery and verification) and have their user and group
ID set (for quota tracking).
Multiple clients may read and write the same object concurrently and
each OST serves an LDLM namespace to resolve object access conflicts. Data
and cache coherency for OST objects is managed by byte-range DLM extent
locks for each object using the object FID as the lock resource. There may be
multiple non-overlapping write extent locks for a single object, and multiple
 
Search WWH ::




Custom Search