Parallel Data Storage and Access - Scientific Data Management

Database Reference

In-Depth Information

instead of a few dedicated idle spare blades, so that reconstructed data can

be written in parallel to all surviving blades much faster than to a single re-

placement blade. Because each file is an independent RAID equation, PanFS

also distributes the rebuild work to all metadata managers and incremen-

tally puts rebuilt files back into normal mode, rather than waiting until all

data is recovered. Collectively parallel reconstruction of declustered per-file

RAID into reserved distributed spare space yields reconstructions that get

faster in bigger systems, whereas traditional RAID reconstruction does not

get faster, especially because the amount of work gets larger as the disks get

bigger.

A second unique feature of Panasas RAID is that the resistance each disk

provides against sectors being unable to be read after they are written is made

much stronger by another layer of correcting code in each disk. This protects

against media read errors failing a reconstruction by repairing the media error

before the data leaves the storage blade.

A third unique feature of Panasas RAID is that clients can be config-

ured to read the RAID parity (or mirror) when reading data to verify the

RAID equation. Since PanFS clients compute RAID parity on writing, this

allows end-to-end verification that the data has not been damaged silently,

on the disk or in network or server hardware. Similar to disk checksums,

this end-to-end parity provides against a much wider range of potential silent

failures.

The Panasas parallel file system and storage cluster is a high-performance

computing storage technology embodying many new technological advances.

But it is also an integrated solution designed for ease of use and high

availability.

2.2.2.2 PVFS

The parallel virtual file system (PVFS) project began at Clemson University

in the early 1990s as an effort to develop a research parallel file system for use

on cluster computers. Since then, the project has grown into an international

collaboration to design, build, and support an open source parallel file system

for the scientific computing community. The project is led by teams at Ar-

gonne National Laboratory and Clemson University. PVFS is widely used in

production settings in industry, national laboratories, and academia. PVFS

is freely available under an LGPL/GPL license from http://www.pvfs.org,

and it has served as a starting point for many research projects. Currently

Linux clusters and IBM Blue Gene/L and Blue Gene/P systems are supported,

with preliminary support for Cray XT series systems. TCP/IP, InfiniBand,

Myrinet GM and MX, and Portals networks are natively supported for PVFS

communication.

PVFS was designed before the “object-based” nomenclature became popu-

lar, but the approach is similar to that used by Lustre and other object-based

file systems. File data is striped across multiple file servers, with the stripe

Scientific Data Management

Search WWH ::

Custom Search

Home