Database Reference
In-Depth Information
bioinformatics files were stored each on a single IOS to minimize the cost
of metadata management. Compute nodes run a file system component that
allows them to access data distributed in this system. The communication
network provides network paths between clients and servers, enabling clients
to take advantage of the storage hardware at the servers and, in particular,
allowing for very high aggregate performance when multiple clients access the
parallel file system simultaneously.
The location of file data, owner, permissions, and creation and modification
dates for a file must also be maintained by the file system. This information,
called metadata , might be stored on the same IOSs holding data or might
be kept on a separate server. In our example system we show the directory
structure distributed across the IOSs; in such a case the file metadata is likely
distributed as well.
2.1.1 Data Consistency and Coherence
Data consistency and cache coherence problems have long been studied since
storage systems became sharable resources. Consistency and coherence define
the outcomes of concurrent I/O operations on a shared file. The problems
occur when at least one operation is a write and the file system must ensure
consistent results. While different levels of data consistency have been defined,
the best known is sequential consistency , which is also adopted by most UNIX
file systems. It requires the results to be as if the multiple I/O operations
happened in some sequential order. For example, given two write requests
overlapping at a certain file location, the contents of the overlaps must come
entirely from either the first write or the second. No interleaved result is
allowed. It is relatively easy for a file system with one server and one disk to
guarantee sequential consistency, but it is dicult for parallel file systems with
more than one server because coordination between servers becomes necessary.
Currently, the most popular solution uses a locking mechanism to enforce data
consistency. Locking provides exclusive access to a requested file region. Such
access, however, also means operation serialization when conflicts exist. As
the number of compute processors in current and future parallel machines
grows to thousands and even millions, guaranteeing such consistency without
degrading the parallel I/O performance is a great challenge.
Since disk drives are currently the most popular storage media, it is impor-
tant to understand how their data access mechanism impacts the file systems'
consistency control. Disk drives can be accessed only in fixed-size units, called
disk sectors. File systems allocate disk space for a file in blocks, which consist
of a fixed number of contiguous sectors. The disk space occupied by a file is
thus a multiple of blocks, and files are accessed only in units of blocks. Under
this mechanism, file systems handle an I/O request of an arbitrary byte range
by first allocating block-size system buffers for disk access and then copying
data between the user and system buffers. The same concept applies to RAID,
in which a block consists of sectors distributed across multiple disks. In order
Search WWH ::




Custom Search