Integrated Performance Monitoring - High Performance Parallel I/O

Hardware Reference

In-Depth Information

FIGURE 26.4: The POSIX and MPI-IO calls proled by IPM's I/O module.

operations such as mmap are often correctly intercepted by POSIX library

calls as they communicate through system calls to the kernel. IPM intercepts

POSIX I/O and reports performance profiles through its standard double open

address hashing technique. An inventory of I/O conducted during execution

with data volumes and timings are reported when execution is complete. A

pre-configured list of POSIX calls supported by IPM is prototyped in the file

ipm.h.

A C library (preloaded or linked) is used to instrument execution. This

can be connected to the execution and/or compile time environment through

modules or user environment customization. When adopted by user communi-

ties or HPC user facilities, IPM provides a barometer for sustained application

performance delivered in the actual context of using the computer as a tool

for science. A shared log directory collects the body of performance profiles,

each with its own XML file. Parsing one or more XML files is done with the

IPM parse utility or a wide variety of custom parsers and report genera-

tors. There are scalability concerns here as part of a data collection strategy.

Very large numbers of files and time spent in XML parsing are often the first

limits to be stretched. Large-scale data analytics have a role in HPC perfor-

mance engineering in addition to their main concerns in physics, chemistry,

and biology, for example.

It is important to recognize that IPM modules may be nested hierarchi-

cally. The MPI-IO layer will often invoke POSIX calls for the I/O transactions

to the file system. IPM intercepts and reports both layers. This can provide

corroborating evidence of I/O performance loss but must be recognized so as

to not double count the reported I/O volume and times.

IPM's feature set for I/O includes

aggregate I/O performed by a parallel job;

task-level I/O including each POSIX call and each buffer size transacted;

and

minimum, maximum, and average time spent in each task level I/O.

This is essentially the I/O parts list along with the rough cost for each part.

For an individual task without \complex" I/O, this provides a code-level view

of the I/O strategy. Across tasks, it demonstrates the success of that strat-

egy given contention and load imbalance. Detection of load imbalance is in

Search WWH ::

Custom Search

Home