Hardware Reference
In-Depth Information
in a block-based file system while an application that tries to open many files
concurrently may suffer from reduced metadata performance.
There are a number of techniques that application developers and HPC
system administrators can use to assess the performance of an application or
the I/O subsystem. This chapter introduces I/O benchmarking, file system
monitoring, and I/O profiling for these purposes.
24.2 I/O Benchmarking
To measure a system's I/O capabilities, the HPC community has long used
I/O benchmarks to measure performance. Benchmarks vary in complexity and
purpose, but in general, the goal is to mimic a workload in a simple way, or
to test and stress certain aspects of the file system [19].
At HPC centers, I/O benchmarks are used to test and accept new systems.
They are also used to investigate reports of low performance and to find
bottlenecks in the file system.
I/O workloads run at HPC centers include a variety of access patterns
and interfaces [19, 13]. Within the DOE Oce of Science user community,
applications use the POSIX, MPI-IO, HDF5, and netCDF interfaces. Some
of these applications write to and read from single shared files while others
use a file-per-processor format. Furthermore, applications display a variety of
access patterns from large block, append-only writes, to small block bursty
I/O patterns, to reading large input files [21].
Figure 24.1 shows some of the common access patterns that result from
moving data between memory and a file. In the simplest case, the data is
contiguous in both memory and the file. Alternatively, the data could be
contiguous in memory, but not in the file; contiguous in the file, but not in
memory; or contiguous in neither. The reasons for different I/O access pat-
terns in different scientific applications vary widely. A developer may choose
a specific data structure because it is most optimal for a particular computa-
tional algorithm, but may choose a different data layout in the file to facilitate
post-processing and analysis.
With such a variety of I/O patterns, it is dicult for a single I/O bench-
mark to represent a complex multi-user environment, so instead, synthetic I/O
benchmarks are used to test and isolate specific I/O components in a storage
subsystem or to measure a distinct I/O pattern [19]. Many of the existing I/O
benchmarks do not reflect a complex HPC workload, either because they only
test POSIX APIs or because they only measure serial I/O performance.
The IOR [7] benchmark is one of the most flexible benchmarks because
simple input parameters can control different I/O APIs such as POSIX, MPI-
IO HDF5, and Parallel-NetCDF. Parameters also control whether output is
a single shared file or multi-file. Additionally, a user can control the size of
 
Search WWH ::




Custom Search