Database Reference
In-Depth Information
2.2.2.3
Lustre
Lustre is a shared file system for clusters originally developed by Cluster File
Systems, Inc. 2 In the June 2006 TOP500 list, over 70 of the 500 supercomput-
ers used Lustre technology. 9 The Lustre architecture is made up of file system
clients, metadata servers (MDSs), and object-storage servers (OSSs). File sys-
tem clients handle the requests from user applications and communicate with
MDSs and OSSs. MDSs maintain a transactional record of high-level file in-
formation, such as directory hierarchy, file names, and striping configurations.
OSSs provide file I/O service, and each can be responsible for multiple object
storage targets (OSTs). An OST is an interface to a single, exported backend
storage volume.
Lustre is an object-based file system targeting strong security, file portabil-
ity across platforms, and scalability. An object is a storage container of vari-
able length and can be used to store various types of data, such as traditional
files, database records, and multimedia data. Lustre implements object-based
device drivers in OSTs to ooad block-based disk space management and
file-to-block mapping tasks from the servers. The object-based device func-
tionalities are built on top of the ext3 file system, with objects stored in files.
Similar to traditional UNIX file systems, Lustre uses inodes to manage file
metadata. However, the inode of a file on the MDSs does not point to data
blocks but instead points to one or more objects associated with the files.
In order to improve I/O performance for large file accesses, a file can be
striped into multiple objects across multiple OSTs. The file-to-object mapping
information along with other metadata for the file, is stored on an MDS. When
a client opens a file, the MDS returns the file's inode so that the client uses
this information to convert file access requests to one or more object access
requests. Multiple object access requests are performed in parallel directly to
multiple OSTs where the objects are stored. Lustre's metadata server software
is multithreaded to improve metadata performance. Substantial modifications
to the ext3 and Linux VFS have been made to enable fine-grained locking of
a single directory. This optimization proves very scalable for file creations and
lookups in a single directory with millions of files.
The Lustre failover technique protects the file metadata through MDS repli-
cation. This is done by directly connecting the MDSs to a multiport disk array
in an OST. When one MDS fails, the replicated MDS takes over. In addition,
MDSs can be configured as an active/passive pair. Often the standby MDS
is the active MDS for another Lustre file system, so no MDSs are idle. To
handle OSS failure, Lustre attaches each OST to different OSSs. Therefore,
if one OSS fails, its OSTs can still be accessible through the failover OSSs.
Lustre also provides journaling and sophisticated protocols to resynchronize
the cluster in a short time.
Lustre is a POSIX-compliant file system. In particular, Lustre enforces the
atomicity of read and write operations. When application threads on different
compute nodes read and write the same part of a file simultaneously, they
Search WWH ::




Custom Search