Hardware Reference
In-Depth Information
tions of the ownership of Lustre interests and support organizations were
navigated.
New and unique performance issues were anticipated in a general sense,
in as much as: (1) the BG/Q system was different enough from previous Blue
Gene generations, due to the new system software, if nothing else; (2) the
Lustre client was all new code; (3) ZFS as a back-end file system for storage
and the metadata server would have an entirely unknown performance profile;
and (4) LC changed storage hardware vendors.
LC's benchmarks, which included data throughput and metadata mea-
sures, appeared that they would be good enough, but turned out to miss some
factors relevant to performance requirements in production. LC users had a
greater initial level of complaints regarding interactive responsiveness. There
were initial ZFS metadata server performance issues that were addressed, and
while absolute performance is not entirely on par with ldiskfs at this time,
the work is still in its infancy and is expected to improve rapidly.
The version of ZFS initially employed write throttling behavior to ensure
that reads were not starved by a write workload. However, this write throttling
proved overzealous and prevented certain key write workloads from running
at acceptable rates. LC worked around the issue by disabling the throttling
option, but the ZFS community is addressing the underlying problem. This
will no longer be an issue in upcoming releases of ZFS.
There were pluses anticipated as well. Indeed, these motivated the ZFS
decision in the first place, as discussed in Section 5.3. The reliability of the
new storage hardware has been a big plus for the new generation of Lustre
file systems at LLNL.
5.7 Sequoia I/O in Practice
5.7.1 General Remarks
The Blue Gene series of systems from IBM is a departure from previous
systems and from the cost-effective Linux systems (peloton, TLCC1, TLCC2)
in that I/O calls are shipped from the large set of compute nodes to a smaller
set of I/O nodes supporting I/O and other system services. This organization
leads to an I/O subsystem which is relatively low powered in comparison to the
TLCC2 systems where every compute node can perform I/O operations. This
is a likely view of future HPC systems, including possible future generations
of TLCC systems, for example. For those systems, configuring I/O nodes
which are more capable in terms of memory, processors, and even connectivity,
rather than sticking so closely to the IBM scaling architecture, might be more
worthwhile.
On Sequoia, overall balance between Lustre client capability, network, and
disk system performance is good. Generally, the Lustre client software is LC's
 
Search WWH ::




Custom Search