Hardware Reference
In-Depth Information
processors. Each compute core has a peak performance of 8.4 Gflops/s re-
sulting in a system with a peak performance of 1.28 PFlops. All but 384
compute nodes have 32 GB of memory while the remaining larger nodes have
64 GB of memory creating a system with over 217 TB of memory. It employs
the Gemini interconnect with a 3D torus topology.
Hopper has two identical local Lustre parallel file systems: /scratch and
/scratch2 ; each has a peak performance of 35 GB/s and a capacity of 1.1
PB. The Lustre file system is made up of an underlying set of I/O servers
and disks called Object Storage Servers (OSSs) and Object Storage Targets
(OSTs), respectively. Each /scratch file system has 26 OSSs and 156 OSTs.
When a file is created in /scratch it is \striped" or split across a specied
number of OSTs. The default stripe count on Hopper is 2 and the default
stripe size is 1 MB.
19.4.2 Software Setup
The study used Cray's MPICH2 library (xt-mpt 5.1.2) for running VPIC.
The I/O module used HDF5 version 1.8.8 and the particle data is written with
H5Part version 1.6.5, along with Cray's MPI-I/O implementation. H5Part is
a veneer API built on top of HDF5 for improving the ease of use for writing
particle data.
19.5 Parallel I/O in VPIC
VPIC uses 20,000 MPI processes, where each MPI process spawns 6
OpenMP threads to perform computations. 1 An overview of the VPIC simu-
lation setup on Hopper is shown in Figure 19.1. Each OpenMP thread runs
on a CPU core and the total amount of CPU cores used in this simulation
is 120,000. The figure also shows MPI-I/O aggregators to collect data from
multiple MPI domains before writing data to the Lustre file system.
To write the particle data, the study uses an extension of parallel HDF5
called H5Part [8]. Parallel HDF5 has demonstrated competitive I/O rates on
modern computational platforms [9]. The H5Part extension to HDF5 improves
the ease of use in managing large particle counts. H5Part is a veneer API for
HDF5: H5Part files are also valid HDF5 files and are compatible with other
HDF5-based interfaces and tools. By constraining the use case to particle-
based simulations, H5Part is able to encapsulate much of the complexity of
implementing eective parallel I/O in HDF5. That is, it trades o HDF5's
1 In subsequent discussion, \MPI process" is referred to as an \MPI domain" in order to
highlight the fact that each MPI process has multiple OpenMP threads.
 
Search WWH ::




Custom Search