Hardware Reference
In-Depth Information
configured with two co-processors) yielding a total peak system performance
of over 9.5 PFLOPS. Stampede has a 56-Gb/s FDR Mellanox InfiniBand net-
work connected in a fat tree configuration, which carries all high-speed trac
(including both MPI and parallel file system data). System management trac
is accommodated via a separate TCP/IP network.
7.2 I/O Hardware
The parallel file system components on Stampede are built from 76 indi-
vidual Dell DCS8200 commodity storage servers, each with sixty-four 3-TB
drives configured across six individual RAID devices. The RAID sets are set
up in an (8 + 2) RAID 6 configuration using standard Linux software RAID
tools with two wandering spares defined. Note that although standard soft-
ware RAID is used to manage the underlying devices, the ext4 journals are
purposefully separated from the raw storage target, and the journals are allo-
cated on mirrored RAID 1 partitions. This is a common approach using this
type of storage architecture and helps to increase file system performance.
To aggregate the storage hardware, the Lustre file system [1] is used to
provide three parallel file systems that are available system-wide (HOME,
WORK, andSCRATCH). The largest and most capable of these three file
systems isSCRATCH, which is supported by 58 of the 76 servers. With six
RAID devices defined per server, this equates to a total of 348 OSTs defined
within Lustre forSCRATCH, resulting in a total of 7.4 PB of user-addressable
storage. The adoption of multiple parallel file systems is common across TACC
HPC system deployments and is designed to provide users with tiered storage
alternatives that vary in size, performance, and file longevity. As a general-
purpose compute platform for the national academic community, Stampede
must support thousands of users during its operational lifespan; indeed, more
than 2,400 individual users ran jobs during the first seven months of opera-
tion. Consequently, fairly modest quotas are implemented for theHOMEand
WORKle systems. The user quota limits and associated target usage modes
for each file system are presented in Figure 7.1.
An important step in the deployment of a production file system is the
completion of a burn-in process for all of the hard drives, which support the
le system. In Stampede's case, there are over 4,800 drives installed in the
OSSs and the peak global file system performance is contingent on ensuring
consistent performance across all of these drives. The file system burn-in pro-
cess on Stampede was carried out over the course of one month (in tandem
with the compute-hardware deployment) and was performed in a two-step
process. First, low-level I/O tests were performed to every disk sector on each
drive in the OSSs. These low-level disk tests ran for multiple 24-hour peri-
ods and were used to identify and replace slow performers along with drives
 
Search WWH ::




Custom Search