Database Reference
In-Depth Information
realistic to expect an increase in reconstruction time of at least 10% per year.
Assuming that today reconstruction times are often about 30 hours and that
3% of drives in a system fail per year on average (as shown in Figure 2.10),
the number of concurrent reconstructions can be projected to rise in future
high-performance computing systems, as shown in Figure 2.11b. This Figure
indicates that in the year 2018, on average, nearly 300 concurrent reconstruc-
tions may be in progress at any time.
Clearly, designers of petascale storage systems will be spending a large
fraction of their efforts on fault tolerance inside the storage systems on which
petascale application fault tolerance depends.
2.5.2 Solid State and Hybrid Devices
Rotational delays in disk drives limit our ability to treat these devices as truly
“random access.” To attain the highest possible performance from storage
systems, software developers must spend a great deal of effort to organize
disk accesses to maximize sequential block access at the disk level. Although
this is trivial for simple, serial, contiguous access patterns, the complex and
concurrent access patterns of HPC applications do not lend themselves to this
type of optimization. The result is that few HPC applications ever see the full
I/O potential of the parallel data systems they are accessing.
Internally, parallel file systems also need very fast, truly random access
storage for use in managing metadata and for write-ahead logging. Lowering
access latency for these two categories can significantly speed small accesses,
file creates and removals, and statistics gathering.
One technique that has been employed in many enterprise products is the
use of battery-backed RAM. In this technique battery power is used to allow
time for committing data from RAM to storage in the event that power is
lost. In normal operation, the RAM serves as a low-latency space for storing
small amounts of data. However, the cost and complexity of this approach
limit its use.
The cost of solid state disk drives (SSDs) has recently dropped to the point
where this technology is becoming a viable component in storage systems.
With latencies of 0.1 ms, these devices have latencies as much as two or-
ders of magnitude lower than traditional hard drives. By integrating these
devices alongside traditional hard drives in a parallel storage system, parallel
file systems can dramatically improve latency of common operations without
a significant impact on reliability. Hybrid disk drives are also appearing: tradi-
tional drives with hundreds of megabytes of NAND flash storage in the same
enclosure.
2.5.3 Extreme-Scale Devices
Modern large-scale parallel computers contain tens to hundreds of thousands
of processor cores. This level of concurrency has posed enormous challenges
Search WWH ::




Custom Search