Hardware Reference
In-Depth Information
5.3 Sequoia, Lustre R 2.0, and ZFS
Historically, Lustre supported ext4/ldiskfs servers, but as early as 2007,
the Livermore Lustre team began to question its viability for the Sequoia
era (and beyond) due to a variety of scalability and performance issues. ZFS
was identified as technically the best solution and work proceeded with much
of the effort of the initial port of ZFS from Solaris to Linux being done at
Livermore [15, 14].
Some of the logic for porting ZFS is as follows: previous Lustre servers used
ext4 (ldiskfs), where the performance of the random writes were bounded
by disk IOPS rate, not by disk bandwidth; OST sizes were limited; fsck
time was unacceptable; expensive hardware was required to make disks re-
liable, but with the new technology, expensive RAID controllers were unnec-
essary. In late 2011, new systems were being deployed that required 50 PB
of high performance storage capacity. LC had new requirements for econom-
ical throughput: 512 GB/s to 1 TB/s. LC also needed to implement copy-
on-write to serialize random writes so that performance would no longer
be bound by drive IOPS. In addition, the center needed single volume size
limit of 16 EB, and zero fsck time including online data integrity and error
handling.
Extending Lustre to use a next-generation file system like ZFS allows
the file system to achieve greater levels of scalability and introduces new
functionality. Some of the new ZFS Lustre server features include data in-
tegrity, pooled storage, capacity, snapshots, compression, and copy-on-write.
With the data integrity features in ZFS, data is always checksummed and
self-repairing to avoid silent corruption. Pooled storage allows easy aggre-
gation of multiple devices into a single OST. (The capacity features gave
a 256 ZB|2 78 bytes|OST size limit enabling larger servers.) Snapshots of
the Lustre file system are stored prior to maintenance and updates. In ad-
dition, transparent compression increases the total usable capacity. Finally,
copy-on-write improves write performance by transforming random I/O into
sequential I/O.
Using ZFS with Lustre was a major undertaking. Native kernel support
for ZFS on Linux was not available, so LLNL undertook the significant effort
required to make that a reality. The Lustre code itself had grown tightly
coupled to ldiskfs , and another significant programming effort was funded
by LLNL to create an abstracted OSD in Lustre. This allowed Lustre to
interact with any file system for which an OSD layer is created, and allowed
Lustre to initially support both ldiskfs and ZFS. Lustre support for ZFS
first appeared in Lustre version 2.4.0, released in May of 2013 [5].
 
Search WWH ::




Custom Search