Database Reference
In-Depth Information
The combination of small form-factor, cost-effective SCSI disks for small
computers and linear block address space led to disk arrays in the early
1990s. A disk array is a set of disks grouped together, usually into a com-
mon physical box called the array, and representing itself as a much larger
“virtual” disk with a linear block address space that interleaves the virtual
disk blocks across the component physical disks. Arrays promised higher ag-
gregate bandwidth, more concurrent random accesses per second, and cost-
and volumetric-effective large storage systems. But with many more me-
chanical disk devices in an array, the component failure rates also rise. In
a paper called “A Case for Redundant Arrays of Inexpensive Disks,” Pat-
terson, Gibson, and Katz described a taxonomy of “RAID” levels showing
different ways disk arrays could embed redundant copies of stored data. 4
With redundant copies of data, the failure of a disk could be transpar-
ently detected, tolerated, and, with online space disks, repaired. The lead-
ing RAID levels are level 0, nonredundant; level 1, duplication of each data
disk; and level 5, where one disk stores the parity of the other disks so that
a known failed disk can be reconstructed from the XOR of all surviving
disks.
SCSI is still with us, and its lower-cost competitors advanced technology
attachment (ATA) and serial ATA (SATA) share the same linear block ad-
dress space and embedded independent controller. RAID has been relabeled
Redundant Arrays of Independent Disks because the expensive, large form-
factor disks have been displaced by relatively inexpensive, smaller form-factor
disks. RAID is a core data management tool in all large data systems. And
most important, the linear block address space abstraction is the basic and
central storage virtualization scheme at work today.
2.2.1.1
General Parallel File System
IBM's general parallel file system (GPFS) grew out of the Tiger Shark multi-
media file system, developed in the mid-1990s. Variants of GPFS are available
for both AIX and for Linux. GPFS is one of the most widely deployed parallel
file systems today.
GPFS implements a block-based file system, with clients either directly
accessing disk blocks via a storage area network or indirectly accessing disk
blocks through a software layer (called virtual shared disk [VSD] or network
shared disk) that redirects operations over a network to a remote system
that performs access on the client's behalf. In large deployments, the cost of
connecting all clients to the storage area network is usually prohibitive, so the
software-assisted block access is more often employed.
Since GPFS is a parallel file system, blocks move through multiple paths,
usually multiple servers, and are striped across multiple devices to allow con-
current access from many clients and high aggregate throughput. To match
the network bandwidths of today's servers, disks used in GPFS deployments
are typically combined into arrays. Files are striped across all RAIDs with
Search WWH ::




Custom Search