Hardware Reference
In-Depth Information
26.2.4 HPC Workload Studies
The previous examples look at individual performance issues. But in the
broader context, an HPC workload that runs on a shared computing resource
requires analysis of many such profiles [11]. This allows one to closely connect
I/O performance to the sustained performance at the HPC system level. When
all applications in a workload are profiled, a convenient first step is to segment
the workload into distinct applications or similar applications by the I/O and
messaging functions invoked by each code. Figure 26.7 shows 1053 jobs run on
the Magellan cluster at NERSC. Of course, not all codes invoke all functions,
and the invoked calls provide a reasonable first step toward classification of
applications.
Jobs showing identical or similar function-level profiles can then be ex-
amined side by side to identify patterns in I/O performance bottlenecks.
There is value in mentioning that a production-integrated approach is one
part of transitioning the HPC performance tools community away from well-
controlled singleton performance experiments in favor of a streaming data
analytics viewpoint that seeks the same goal of making I/O fast. Clustering
and classification of in-vivo I/O measurements is a different paradigmatic per-
spective than that of performance/cost models built on independent parallel
flows. There are many ways for applications as actually run by scientists to
FIGURE 26.7: Large numbers of jobs in a workload may be dicult to identify
by name or other job metadata. The 1053 jobs profiled above can be examined
from a workload perspective by examining the calls each job invokes. In the
figure above, functions are given an ordinal index that groups calls by MPI,
MPI Collective, and POSIX I/O.
 
Search WWH ::




Custom Search