Hardware Reference
In-Depth Information
FIGURE 26.4: The POSIX and MPI-IO calls proled by IPM's I/O module.
operations such as mmap are often correctly intercepted by POSIX library
calls as they communicate through system calls to the kernel. IPM intercepts
POSIX I/O and reports performance profiles through its standard double open
address hashing technique. An inventory of I/O conducted during execution
with data volumes and timings are reported when execution is complete. A
pre-configured list of POSIX calls supported by IPM is prototyped in the file
ipm.h.
A C library (preloaded or linked) is used to instrument execution. This
can be connected to the execution and/or compile time environment through
modules or user environment customization. When adopted by user communi-
ties or HPC user facilities, IPM provides a barometer for sustained application
performance delivered in the actual context of using the computer as a tool
for science. A shared log directory collects the body of performance profiles,
each with its own XML file. Parsing one or more XML files is done with the
IPM parse utility or a wide variety of custom parsers and report genera-
tors. There are scalability concerns here as part of a data collection strategy.
Very large numbers of files and time spent in XML parsing are often the first
limits to be stretched. Large-scale data analytics have a role in HPC perfor-
mance engineering in addition to their main concerns in physics, chemistry,
and biology, for example.
It is important to recognize that IPM modules may be nested hierarchi-
cally. The MPI-IO layer will often invoke POSIX calls for the I/O transactions
to the file system. IPM intercepts and reports both layers. This can provide
corroborating evidence of I/O performance loss but must be recognized so as
to not double count the reported I/O volume and times.
IPM's feature set for I/O includes
aggregate I/O performed by a parallel job;
task-level I/O including each POSIX call and each buffer size transacted;
and
minimum, maximum, and average time spent in each task level I/O.
This is essentially the I/O parts list along with the rough cost for each part.
For an individual task without \complex" I/O, this provides a code-level view
of the I/O strategy. Across tasks, it demonstrates the success of that strat-
egy given contention and load imbalance. Detection of load imbalance is in
 
Search WWH ::




Custom Search