Database Reference
In-Depth Information
factor causing poor I/O performance is the file system overhead on data con-
sistency control. A POSIX-compliant file system, such as GPFS, Lustre, and
Panasas, must guarantee the I/O atomicity, data-sequential consistency, and
cache coherence. These requirements are commonly enforced through a lock
mechanism. Atomicity needs a lock for every I/O call, but locks can easily
degrade the degree of I/O parallelism for concurrent file operations. Meeting
these POSIX requirements has been reported as a major obstacle to parallel
I/O performance. 38
Many fundamental problems of parallel I/O not being able to achieve the
hardware potential lie in the file systems' obsolete protocols, not suitable for
today's high-performance computing systems. File systems have long been de-
signed for sequential access and have treated each I/O request independently.
This strategy works well in the nonshared or distributed environments, but
poorly for parallel applications where the majority of I/O accesses data that
are part of global data structures. For instance, when a program reads a two-
dimensional array and partitions it among all running processes, each process
makes a request of a subarray to the underlying file system. However, each
of these requests will be considered separately by the file system for atomic-
ity, consistency, and coherence controls. In order to address this issue, future
file systems must support parallel I/O natively. New programming interfaces
will allow applications to supply high-level data access information to the file
systems. For example, a new interface could tell the file system that a par-
allel write by a group of processes should be considered as a single request.
Thus, a file system's atomicity and consistency controls could skip checking
the internal conflicts of the parallel I/O requests.
Another interesting result from the survey is that most of the programmers
use customized file formats rather than the standardized self-describing for-
mats such as netCDF or HDF5. The one-file-per-process is popular because
of its simplicity; users are reluctant to adopt complex I/O methods if no sig-
nificant performance gains or other benefits are promised. As the data size
generated from today's parallel applications reaches the scale of terabytes or
petabytes, scientists are more willing to consider alternative I/O methods to
meet the requirements such as portability and ease of management. However,
tradeoffs between performance and productivity will always exist.
2.6 Summary
In this chapter we have discussed at a high level the hardware and software
that work together to provide parallel data storage and access capabilities for
HPC applications. These technologies build on disk array and RAID tech-
niques discussed in the preceding chapter, and they provide a critical infras-
tructure that allows applications to conveniently and eciently use storage
Search WWH ::




Custom Search