Hardware Reference
In-Depth Information
18.3 Deployment, Usage, and Applications
GLEAN has been deployed on leadership computing systems at Argonne
for I/O acceleration, asynchronous data staging, and in situ visualization for
a number of diverse applications.
18.3.1 Checkpoint, Restart, and Analysis I/O for HACC
Cosmology
Next-generation sky surveys will map billions of galaxies to explore the
physics of the \Dark Universe." Science requirements for these surveys de-
mand simulations at extreme scales; these will be delivered by the HACC
(Hybrid/Hardware Accelerated Cosmology Code) framework [5]. GLEAN's
I/O performance was integrated and evaluated with the HACC simulation.
A weak scaling study was performed using a number of cores from 16,384 to
262,144 on the Mira BG/Q system with a total number of particles ranging
from 2048 3 to 5012 3 . The total data per rank varies between 38 MB and 57
MB|i.e., 1 to 1.5 million particles per rank. Thus, 400 GB were written at
16,000 cores and 4.8 TB of data at 256,000 cores. The performance of I/O
for HACC was compared using four configurations: (1) MPI collective I/O to
a single shared file; (2) subfiling with a file per I/O network; (3) using both
topology-aware aggregation and subfiling; and (4) using all the three com-
ponents of the framework; i.e., compression, aggregation, and subfiling. The
simulation was performed for 10 steps with each I/O configuration and the
maximum performance was reported for each configuration.
Various I/O configurations show different performance rates, as depicted
in Figure 18.3. Subling yielded a 5 improvement at 16,000 cores and a
10 improvement at 256,000 cores over a single shared le. Thus, subling
is of critical importance as it mitigates the impact of the parallel file system
metadata overheads at scale. Topology-aware aggregation yields a 20% im-
provement over subfiling at 16K cores and up to an 80% improvement over
subfiling at 256,000 cores. Thus, leveraging system topology plays an increas-
ingly important role when scaling to larger core counts. Using compression,
and thus using all three components of the I/O framework, an additional 40%
increase in performance was observed over using aggregation. This is primar-
ily due to the ability to achieve 50% compression for the HACC datasets,
and thus writing less data to the storage system. Overall, by using subfiling,
aggregation, and compression, a 10 improvement was observed over MPI col-
lective I/O at 16,000 cores and a 14 improvement at 256,000 cores, achieving
130 GB/s. Thus, all three are critical to achieving scalable parallel I/O per-
formance. The combination of compression, topology-aware data aggregation,
and the subfiling mechanism improved I/O performance on the HACC appli-
cation multifold. Thus, achieving optimal I/O system performance required
 
Search WWH ::




Custom Search