Database Reference
In-Depth Information
Panasas storage clusters are widely available. 6 , 7 Shelves include redundant
networking, power supplies, cooling fans, and an integrated UPS for hardware
fault tolerance. Software fault tolerance is provided by three different layers: a
replicated global state database and directory; failover mirroring of metadata
manager state; and a novel declustered, object-based RAID implementation.
A quorum consensus voting protocol is employed on a subset of metadata man-
agers to coordinate changes in configuration and blade states. These changes
are rare and need to be consistent across all blades. The manager that imple-
ments this global state also implements a directory service so that all blades
and clients can find all other services through it. The domain name system
(DNS) name of the entire cluster resolves to the addresses on which this di-
rectory service is available, so a client's mount table needs little more than
the storage cluster's DNS name to bootstrap all services. Metadata managers,
however, change state far too often and without needing global synchroniza-
tion. Metadata manager state changes are mirrored on a backup metadata
manager, and each manager journals changes against reboot failures to avoid
file system checking.
The RAID implementation of Panasas is unique in several ways. First, each
file has its own RAID equation, so small files can be mirrored for ecient
small file update, and large files can be parity protected for low-capacity
overhead and high large-transfer bandwidth. Then all files are placed in ob-
jects spread over all object servers in such a manner that the failure of any
object server will engage a small fraction of all other object servers in its re-
construction, as shown in Figure 2.5. This distributed RAID scheme, known
as declustering, enables much higher read bandwidth during reconstruction
and lower interference in user workload relative to traditional RAID that re-
constructs a failure with the help of all of the storage in a small subset of
the surviving disks. Panasas RAID also reserves spares differently from most
RAID implementations; it reserves a fraction of the capacity of all blades
A
D
e
h
i
m
A
C
D
G
J
K
b
C
f
h
J
K
C
D
e
G
i
m
b
D
e
G
i
m
A
b
f
h
J
K
b
C
f
G
i
m
A
e
f
h
J
K
Figure 2.5 Declustered object-based RAID in Panasas storage defines a
different RAID equation for every file, randomizing placement so that all
disks assist in every reconstruction.
Search WWH ::




Custom Search