Hardware Reference
In-Depth Information
contain the actual file contents. For small files, this may be a single datafile,
and for larger files this may be as many as desired, up to the number of
file servers in the system. The metafile contains a list of the datafiles for the
file and a distribution method used to map logical file data to the datafiles.
Directories have a similar structure with one dirmetafile, and one or more
dirdatafiles. Dirdatafiles contain directory entries (dirents) with references to
the metafile of each file in the directory. An extensible hashing scheme is used
to locate dirents among the dirdatafiles. OrangeFS uses a database to hold
attributes, leaving the management of bytestreams (the storage component of
a datale) to each node's local le system. After accessing the metadata server
once for a le's location, an OrangeFS client can thereafter interface directly
with the data servers, eliminating a major bottleneck.
10.2.3.1
Trove
Servers in the file system operate on objects through Trove. Trove is a
software abstraction layer that implements dataspaces (or objects), provides
a non-blocking interface to those objects, and serves as a means to manage
multiple implementations of itself (optimized for different environments). Each
Trove object has its own unique handle, a bytestream, and a set of key-value
pairs. Every object within a given file system can be located using its handle.
Bytestreams are sequences of bytes with an arbitrary length that generally
store file data. Key-value pairs allow data to be stored and retrieved using a
\key" and generally store attributes and other le metadata. Objects serve
different purposes in the file system and may utilize the bytestream, the key/-
value pairs, or both, as needed. Trove has a range of methods for accessing the
different parts of an object. There are several implementations of the Trove
methods, but currently, all of them implement bytestreams in a local file sys-
tem (ext3 [9], xfs [10], zfs [8], etc.) and key-value pairs in a key value store
(i.e., Berkeley DB [2]). Different implementations vary in how they manage
concurrency and how they interact with storage. For example, the DirectIO
method is optimized for use with servers that have large commercial RAID
back-ends.
10.2.4 Bulk Messaging Interface
Both client and server communicate with each other over a local-area net-
work. OrangeFS provides a bulk messaging interface (BMI) as a layer that
provides a common interface for many different network fabrics and uses the
post/test non-blocking model to allow the server to manage concurrency. Much
like Trove, BMI defines a common interface and an internal set of methods
that can be implemented by many different network substrates. To date there
are BMI methods for TCP/IP, MX (Myrinet), IB (Infiniband), portals, and
others. For those networks that provide a zero-copy interface, BMI allows
OrangeFS to bypass kernel interaction where possible.
 
Search WWH ::




Custom Search