Hardware Reference
In-Depth Information
diverge from their sketches as benchmarks. That this is observed in the few
systemwide profiling studies conducted on production HPC is an indication
that the performance tools community may do well to shift its perspective to
one of sustained performance in a production setting.
In order to target workload-level improvements in I/O, it is useful to
prioritize the opportunities for analysis by their contribution to the overall
core hours expended by the workload. One such prioritization is given in
Figure 26.7 where a treemap organized by core hours vs. user and concur-
rency clarifies that some profiles are worth more attention than others. Both
function-level profiles and user-concurrency pairs do not conclusively segment
the workload by distinct applications. In practice on production machines,
such as those at NERSC, one can often find clear I/O performance patterns
using such filters. Performance profile classes that are suffciently similar and
persist across many jobs point to I/O strategies that need to be improved
at the application level or in user environment settings. Figure 26.4 shows
such a single code run on 128 tasks by a single user. The I/O strategy shows
persistent load imbalance between MPI ranks with some tasks transacting
substantially more I/O than others. In other IPM case studies, several many-
to-few and many-to-one I/O strategies were identified with less than optimal
I/O rates.
26.3 Conclusion
Profiling I/O through the interception of POSIX-I/O calls provides a scal-
able and generalizable path to understanding application I/O performance.
Examples of using IPM for this purpose demonstrate some commonly en-
countered aspects of I/O worth consideration to improve performance. This
technique is shared by other tools like TAU (see Chapter 25), Darshan (see
Chapter 27), among others; and the performance analysis is generalizable to
other tools that provide trace or profiles of POSIX calls. Root causing I/O per-
formance with a single tool can be complex. Some approaches involving [10, 8]
simultaneous profiling of application and file system have been explored to
allow more conclusive determination of I/O in relation to file system capa-
bilities. For I/O bottlenecks that have their origin within an application or
that arise from persistent contention for I/O can often be addressed without
a specific root cause. Profiling POSIX-I/O calls can, as demonstrated in the
above examples, provide suffcient visibility into I/O performance to improve
performance. By addressing sources of load imbalance and scalable I/O, better
strategies can be implemented for applications and workloads. IPM focuses
on achieving such I/O performance gains in the context of production HPC
applications and workloads, because computers should be fast for a reason.
 
Search WWH ::




Custom Search