Hardware Reference
In-Depth Information
practice often the primary source of I/O performance loss. Root causes for
this imbalance include many-to-few I/O strategies, file system striping, and
congestion of I/O due to overlapping I/O operations. I/O performance losses,
balanced or not, are sometimes due to the transfer (buffer) sizes of I/Os being
so small that transactional overheads are high, or due to synchronous locking
I/O operations.
When the I/O load is balanced across tasks, the next question is whether
the sustained rates are achieving the I/O rates that the storage system is
expected to deliver. If not, then what is the underlying cause of the loss? In
some cases the loss is due to the strategy, and in others cases is due to resource
scheduling or contention that lies outside the application's control. Defensive
I/O strategies are thus sought as much as absolutely optimal strategies. I/O
hangs are a notorious source of vexation among HPC enthusiasts and the
notions of defense extends to the lower ends of performance. Most HPC I/O
goes unmonitored and this is likely a rich area for investigation to guide future
data science architectures [5].
There is much to be gained from continued research in this area. As exa-
scale architectures emerge, the pathways from compute core to disk will be-
come more complex as will their performance. It is interesting to consider
architectural simulation in the design and provisioning of such systems. Given
a body of existing I/O profiles, can one map these into a performance estimate
as to what would be possible on a proposed architecture? To what degree can
we construct useful models for the design of future I/O systems [3]?
To make actionable decisions about I/O it is important to build models
from profiles that are tightly integrated with application performance as it
happens. The following sections draw from HPC application I/O scenarios
observed at NERSC using IPM.
26.2 Success Stories
26.2.1 Chombo's ftruncate
Chombo's ftruncate is a simple case study that shows why profiling is
best done in a production setting. Figure 26.5 shows a wide range of I/O tun-
ing techniques applied by HPC experts to the Chombo code. The dominant
increase in I/O bandwidth is attributable to removing an extraneous POSIX
call from the production-deployed parallel I/O libraries. A profiling interface
that captures either the application's I/O activity, the operating system's, or
preferably both is often enough to reveal which type of I/O and/or which
system resources drive the time spent in I/O. In some cases the improve-
ments listed above took place in how the middleware is used and in other
cases changes were made directly to the middleware. For instance, the remove
 
Search WWH ::




Custom Search