Database Reference
In-Depth Information
complex operations from many small ones is minimal. When a network link
is introduced between the file system and the operating system, however, the
costs change. The complex access patterns of parallel applications described
previously require many individual POSIX operations, each incurring a round
trip time at the minimum. Performance drops quickly. Network file system
developers combat this problem by enabling read-ahead, caching, and write-
back on file system clients (the operating system instances accessing the file
system). These allow the client to avoid network communication when pre-
dictable patterns are present and adequate memory is free for caching.
When caching is introduced, the consistency semantics of POSIX then be-
come a significant challenge. Because operations must be sequentially consis-
tent, the file system must strictly manage concurrent access to file regions.
In parallel file systems, access management typically is accomplished with a
single-writer, multiple-reader, distributed-range locking capability that allows
concurrent reading in the absence of writers but retains sequential consistency
in the presence of writers. This is a proven technique, but maintaining locks
introduces communication again, and the locks become a state that must be
tracked in case of client failure.
In parallel applications, developers are accustomed to synchronizing pro-
cesses when necessary and rarely overwrite file regions simultaneously. In or-
der to enable higher-performance application I/O than is possible through
the POSIX API, a richer I/O language was needed that enables developers to
describe and coordinate access across many application processes, and that
enables the greatest degree of concurrency possible.
2.4.3 MPI-IO
The MPI-IO interface is a component of the MPI-2 message-passing interface
standard 12 and defines an I/O interface for use in applications using the MPI
programming model. The model for data in a file is the same as in POSIX
I/O: a stream of bytes that may be randomly accessed. The major differences
between POSIX I/O and MPI-IO lie in the increased descriptive capabilities
of the MPI-IO interface. Language bindings are provided for C, C++, and
Fortran, among others.
One trend that emerged from access pattern studies was that applications
often access data regions that are not consecutive, that is, noncontiguous ,in
the file. The POSIX interface forces applications exhibiting these patterns
to perform an access per region 1 ; this constraint makes this type of access
very inconvenient to program, and performing many small operations often
1 POSIX does define the lio listio call that may be used to access noncontiguous regions, but it
places a low limit on how many regions may be accessed in one call (often 16), limits concurrency
by forcing the underlying implementation to perform these accesses in order, and requires that
memory regions and file regions be of the same size. These constraints make lio listio of limited
use in scientific applications.
Search WWH ::




Custom Search