Hardware Reference
In-Depth Information
FIGURE 19.3: Performance improvement with patching HDF5 truncate.
were conducted with VPIC-IOBench using 8,000 tasks, and varied the stripe
count from 64 OSTs to the maximum of 156 and the stripe size from 1 MB to
1 GB. The Cray Lustre-aware, MPI-I/O implementation varies the MPI-I/O
collective buffer aggregators and their buffer size to match the corresponding
stripe count and stripe size. Prior experiments indicated that the attainable
data rate did not increase with stripe counts beyond 144. The last few OSTs
did not add any performance. Finally, 144 OSTs with a stripe size 64 MB
were chosen.
The results of a scaling study for 1,000 to 128,000 MPI tasks are shown in
Figure 19.4. This is a weak scaling study in that the number of particles per
task was constant at eight million. These experiments use the modified HDF5
library that has the patch for disabling file size verification when an HDF5 file
closes. As the number of MPI tasks increases, the I/O rate becomes greater.
With fewer MPI tasks running on a highly shared system, such as Hopper,
interference from I/O activity of other jobs can reduce the attained I/O rate.
At the scale of 128,000 cores, VPIC-IOBench occupies 85% of Hopper, which
reduces the opportunity for interference from other jobs sharing the I/O sys-
tem. The 128,000 task instance writes about 32 TB of data, and Figure 19.4
shows that at that scale, the delivered I/O performance is about 27 GB/s,
which compares favorably with the rated maximum on Hopper of about 35
GB/s. It is also comparable with the best rates achieved with a file per pro-
cess (fpp) model. The trillion-particle VPIC simulation uses the same values
for Lustre striping as those used after tuning the VPIC-IOBench kernel. As
shown in Figure 19.5, writing each of the eight variables by the simulation
achieves peak I/O rate of the file system on Hopper.
 
Search WWH ::




Custom Search