Parallel Data Storage and Access - Scientific Data Management

Database Reference

In-Depth Information

realistic to expect an increase in reconstruction time of at least 10% per year.

Assuming that today reconstruction times are often about 30 hours and that

3% of drives in a system fail per year on average (as shown in Figure 2.10),

the number of concurrent reconstructions can be projected to rise in future

high-performance computing systems, as shown in Figure 2.11b. This Figure

indicates that in the year 2018, on average, nearly 300 concurrent reconstruc-

tions may be in progress at any time.

Clearly, designers of petascale storage systems will be spending a large

fraction of their efforts on fault tolerance inside the storage systems on which

petascale application fault tolerance depends.

2.5.2 Solid State and Hybrid Devices

Rotational delays in disk drives limit our ability to treat these devices as truly

“random access.” To attain the highest possible performance from storage

systems, software developers must spend a great deal of effort to organize

disk accesses to maximize sequential block access at the disk level. Although

this is trivial for simple, serial, contiguous access patterns, the complex and

concurrent access patterns of HPC applications do not lend themselves to this

type of optimization. The result is that few HPC applications ever see the full

I/O potential of the parallel data systems they are accessing.

Internally, parallel file systems also need very fast, truly random access

storage for use in managing metadata and for write-ahead logging. Lowering

access latency for these two categories can significantly speed small accesses,

file creates and removals, and statistics gathering.

One technique that has been employed in many enterprise products is the

use of battery-backed RAM. In this technique battery power is used to allow

time for committing data from RAM to storage in the event that power is

lost. In normal operation, the RAM serves as a low-latency space for storing

small amounts of data. However, the cost and complexity of this approach

limit its use.

The cost of solid state disk drives (SSDs) has recently dropped to the point

where this technology is becoming a viable component in storage systems.

With latencies of 0.1 ms, these devices have latencies as much as two or-

ders of magnitude lower than traditional hard drives. By integrating these

devices alongside traditional hard drives in a parallel storage system, parallel

file systems can dramatically improve latency of common operations without

a significant impact on reliability. Hybrid disk drives are also appearing: tradi-

tional drives with hundreds of megabytes of NAND flash storage in the same

enclosure.

2.5.3 Extreme-Scale Devices

Modern large-scale parallel computers contain tens to hundreds of thousands

of processor cores. This level of concurrency has posed enormous challenges

Search WWH ::

Custom Search

Home