Integrated Performance Monitoring - High Performance Parallel I/O

Hardware Reference

In-Depth Information

26.2.4 HPC Workload Studies

The previous examples look at individual performance issues. But in the

broader context, an HPC workload that runs on a shared computing resource

requires analysis of many such profiles [11]. This allows one to closely connect

I/O performance to the sustained performance at the HPC system level. When

all applications in a workload are profiled, a convenient first step is to segment

the workload into distinct applications or similar applications by the I/O and

messaging functions invoked by each code. Figure 26.7 shows 1053 jobs run on

the Magellan cluster at NERSC. Of course, not all codes invoke all functions,

and the invoked calls provide a reasonable first step toward classification of

applications.

Jobs showing identical or similar function-level profiles can then be ex-

amined side by side to identify patterns in I/O performance bottlenecks.

There is value in mentioning that a production-integrated approach is one

part of transitioning the HPC performance tools community away from well-

controlled singleton performance experiments in favor of a streaming data

analytics viewpoint that seeks the same goal of making I/O fast. Clustering

and classification of in-vivo I/O measurements is a different paradigmatic per-

spective than that of performance/cost models built on independent parallel

flows. There are many ways for applications as actually run by scientists to

FIGURE 26.7: Large numbers of jobs in a workload may be dicult to identify

by name or other job metadata. The 1053 jobs profiled above can be examined

from a workload perspective by examining the calls each job invokes. In the

figure above, functions are given an ordinal index that groups calls by MPI,

MPI Collective, and POSIX I/O.

Search WWH ::

Custom Search

Home