Hardware Reference
In-Depth Information
Chapter 24
Overview of I/O Benchmarking
Katie Antypas and Yushu Yao
National Energy Research Scientific Computing Center, Lawrence Berkeley
National Laboratory
24.1
Introduction :::::::::::::::::::::::::::::::::::::::::::::::::::::: 279
24.2
I/O Benchmarking ::::::::::::::::::::::::::::::::::::::::::::::: 280
24.3
Why Prole I/O in Scientic Applications? ::::::::::::::::::::: 283
24.4
Brief Introduction to I/O Prolers :::::::::::::::::::::::::::::: 283
24.5
I/O Proling at NERSC ::::::::::::::::::::::::::::::::::::::::: 284
24.5.1
Application Proling Case Studies :::::::::::::::::::::: 284
24.5.1.1
Checkpointing Too Frequently ::::::::::::: 285
24.5.1.2
Reading Small Input Files from Every
Rank ::::::::::::::::::::::::::::::::::::::: 286
24.5.1.3
Using the Wrong File System :::::::::::::: 286
24.6
Conclusion :::::::::::::::::::::::::::::::::::::::::::::::::::::::: 287
Bibliography :::::::::::::::::::::::::::::::::::::::::::::::::::::: 287
24.1 Introduction
For users of HPC systems, I/O remains a challenge in achieving high per-
formance on large-scale parallel systems. There are numerous reasons for I/O
bottlenecks. First, an I/O subsystem may be undersized for a particular HPC
compute partition. A great challenge for HPC centers is how much budget to
devote to components of a system. The balance of the I/O partition to the
compute partition depends on the system's workload as well as the schedul-
ing policies. Second, depending on how a system is architected, concurrent
applications could be sharing limited I/O resources, leading to lower perfor-
mance. I/O subsystem resources that could produce increased latencies and
reduced bandwidth with multiple concurrent applications include contention
in I/O nodes, network components, metadata servers, spinning disk, amongst
others. Last, how a user reads and writes data can greatly affect application
performance (also discussed in Chapters 19{23). A user performing I/O, in
a non-optimal manner may see low performance because of these operations.
An application that performs many small writes may run into lock contention
279
 
Search WWH ::




Custom Search