Hardware Reference
In-Depth Information
system named obdfs. Over the next three years, and with early funding from
the ASCI Path Forward project, obdfs evolved into the first versions of Lus-
tre [4] with a debut at number 5 in the Top500 on the 1000-node MCR cluster
at Lawrence Livermore National Laboratory (LLNL) [12]. Continued develop-
ment over the next ten years saw increasing adoption of Lustre on a wide range
of HPC systems in academia and industry. By 2013, Lustre was deployed on 7
out of the top 10 and around 60% of the top 100 supercomputers in the world
as listed by the Top500. Several of these support tens of thousands of clients,
10s of Petabytes of capacity, and I/O performance of over 1TB/s [16, 6].
8.2 Design and Architecture
8.2.1 Overview
Lustre is a Linux file system implemented entirely in the kernel. Its ar-
chitecture is founded upon distributed object-based storage. This delegates
block storage management to its back-end servers and eliminates significant
scaling and performance issues associated with the consistent management of
distributed block storage metadata.
Lustre objects come in two varieties|data objects, which are simple byte
arrays typically used to store the data of POSIX files, and index objects,
which are key-value stores typically used to implement POSIX directories.
These objects are implemented by the Lustre Object Storage Device (OSD),
an abstraction that enables the use of different back-end file systems, including
ext4 and ZFS. A single OSD instance corresponds to a single back-end storage
volume and is termed a storage target. The storage target depends on the
underlying file system for resilience to storage device failure, but may be
instantiated on any server that can attach to this storage to provide high
availability in the event of server or controller failure.
Storage targets are exported either as metadata targets (MDTs), used for
file system namespace operations, or object targets (OSTs), used to store file
data. These are usually exported by servers configured specifically for their
respective metadata or data workloads|e.g., RAID 10 storage hardware and
high core counts for metadata servers (MDSs) and high capacity RAID6 stor-
age hardware and lower core counts for object storage servers (OSSs). Histor-
ically, Lustre clusters have consisted of a pair of MDS nodes configured for
active-passive failover and multiple OSSs configured for active-active failover.
More recent Lustre releases support multiple MDTs in the same file system
and therefore multiple MDS nodes with active-active failover are expected to
become more common.
Lustre clients and servers communicate with each other using a layered
communications stack. The underlying physical and/or logical networks such
 
Search WWH ::




Custom Search