Hardware Reference
In-Depth Information
wrapped symbol names and expand these internally to construct the appropri-
ate command line. TAU's compiler scripts have been updated to automatically
add the necessary flags to the linker command line when the user sets a spe-
cial I/O instrumentation flag ( -optTrackIO ) in the TAUOPTIONS environment
variable. Section 25.3 describes this approach in greater detail with regards
to GCRM profiling.
25.2.4 Instrumented External I/O Libraries
When a user needs to evaluate the time spent in un-instrumented I/O
libraries, such as HDF5 (see Chapter 16) and other system libraries, it is im-
portant to be able to generate custom user-directed wrapper libraries. These
wrapper libraries may be pre-loaded at runtime or re-linked to create an instru-
mented binary using linker-based instrumentation as described above. How-
ever, manually building these libraries may prove to be cumbersome. TAU
automates the creation of these wrapper libraries using the taugenwrapper
tool.
25.3 Success Stories
The Global Cloud Resolving Model (GCRM) [3] models climate on the
entire globe at a horizontal grid spacing of at least 4 km, and a vertical
dimension on the order of 256 layers resulting in over 10 billion cells. A single
cell-based variable written in single precision will require approximately 43
GB of disk storage. Corner data will require 85 GB and edge data 128 GB.
A single snapshot of history data will require 1.8 TB of storage as currently
configured. Climate scientists will want to write data as frequently as possible
(down to the order of minutes) while maintaining an I/O cost below 10%
of the overall simulation. Obviously, the eciency of the I/O is a critical
requirement. Understanding and optimizing the behavior of the I/O system
for an application is dicult for several reasons. First, there are several layers
in the I/O stack, some of which are proprietary software. Second, there are
many options for controlling these layers varying from optional arguments, to
hints, or to alternative APIs. Third, there are often multiple implementations
of some of the layers.
Profiling all the layers of the GCRM I/O is necessary in order to deter-
mine where the true bottlenecks reside. TAU provides the capabilities both to
look deep into the various API layers and to organize and analyze the numer-
ous configurations under evaluation. For instance, application phases could
be profiled, and read and write bandwidths were evaluated for each phase.
Figure 25.1 shows the data for each file and read operation collected by TAU.
Here we see how the MPI-IO layer internally calls the POSIX I/O layer. Fig-
ure 25.2 shows the peak I/O bandwidth and I/O volume for read calls on each
 
Search WWH ::




Custom Search