GLEAN - High Performance Parallel I/O

Hardware Reference

In-Depth Information

18.3 Deployment, Usage, and Applications

GLEAN has been deployed on leadership computing systems at Argonne

for I/O acceleration, asynchronous data staging, and in situ visualization for

a number of diverse applications.

18.3.1 Checkpoint, Restart, and Analysis I/O for HACC

Cosmology

Next-generation sky surveys will map billions of galaxies to explore the

physics of the \Dark Universe." Science requirements for these surveys de-

mand simulations at extreme scales; these will be delivered by the HACC

(Hybrid/Hardware Accelerated Cosmology Code) framework [5]. GLEAN's

I/O performance was integrated and evaluated with the HACC simulation.

A weak scaling study was performed using a number of cores from 16,384 to

262,144 on the Mira BG/Q system with a total number of particles ranging

from 2048 3 to 5012 3 . The total data per rank varies between 38 MB and 57

MB|i.e., 1 to 1.5 million particles per rank. Thus, 400 GB were written at

16,000 cores and 4.8 TB of data at 256,000 cores. The performance of I/O

for HACC was compared using four configurations: (1) MPI collective I/O to

a single shared file; (2) subfiling with a file per I/O network; (3) using both

topology-aware aggregation and subfiling; and (4) using all the three com-

ponents of the framework; i.e., compression, aggregation, and subfiling. The

simulation was performed for 10 steps with each I/O configuration and the

maximum performance was reported for each configuration.

Various I/O configurations show different performance rates, as depicted

in Figure 18.3. Subling yielded a 5 improvement at 16,000 cores and a

10 improvement at 256,000 cores over a single shared le. Thus, subling

is of critical importance as it mitigates the impact of the parallel file system

metadata overheads at scale. Topology-aware aggregation yields a 20% im-

provement over subfiling at 16K cores and up to an 80% improvement over

subfiling at 256,000 cores. Thus, leveraging system topology plays an increas-

ingly important role when scaling to larger core counts. Using compression,

and thus using all three components of the I/O framework, an additional 40%

increase in performance was observed over using aggregation. This is primar-

ily due to the ability to achieve 50% compression for the HACC datasets,

and thus writing less data to the storage system. Overall, by using subfiling,

aggregation, and compression, a 10 improvement was observed over MPI col-

lective I/O at 16,000 cores and a 14 improvement at 256,000 cores, achieving

130 GB/s. Thus, all three are critical to achieving scalable parallel I/O per-

formance. The combination of compression, topology-aware data aggregation,

and the subfiling mechanism improved I/O performance on the HACC appli-

cation multifold. Thus, achieving optimal I/O system performance required

Search WWH ::

Custom Search

Home