Hardware Reference
In-Depth Information
computation uses up to 6.7 kW/TFLOPS. As mentioned before, Palmetto was
lightly loaded at the time of measurement, so its power consumed per TFLOP
of capability is artificially low.
34.2.6 Extrapolation to Exascale
At 30 kW/PB, Cielo stands as a machine that has extremely ecient
power use for I/O and infrastructure. However, without a change of strategy,
an exascale storage system may consume too much power. To understand why,
a common workload is analyzed for large-scale machines: checkpointing. One
rule of thumb is that an application should spend less than 5% of its execution
time writing checkpoints to storage. With today's disks, this necessitates that
the system purchaser buys enough disks to provide necessary bandwidth. The
storage space obtained from the purchase is often far in excess of what is
otherwise required in a scratch file system.
This assumption will also be true in the future. The first exascale machine
is estimated to have at least 32 PB of RAM [30]. Disks are projected to
hold 29.52 TB, and have a bandwidth of 384.2 MB/s [18]. The system is
expected to require 320 PB to 1 EB of storage space [30]. Disregarding fault
tolerance measures like RAID, the capacity can be satisfied with 10,847 to
33,898 disks. However, to provide necessary bandwidth to accept a checkpoint
every hour, the system will need to provide at least 106.7 TB/s of bandwidth.
This requires over 277,633 disks, with over 8 EB of capacity! Holding power
per disk constant to the survey finds that this machine would require about
13 MW of power, or 65% of the exascale system's power budget, without extra
capacity and bandwidth for fault tolerance.
Clearly, there will have to be some change from business as usual to
achieve acceptable performance for exascale workloads. One cannot satisfy
exascale bandwidth requirements as outlined above by checkpointing directly
to disks. The next sections explore evolutionary storage technologies, revo-
lutionary storage technologies, and application technologies that will allow
future workloads to be reasonably satisfied on an exascale machine.
34.3 How I/O Changes at Exascale
Between now and the deployment of the first exascale machine, there will
be a point where it will be economically impossible to build a large, homo-
geneous file system that meets both bandwidth and storage requirements for
bulk synchronous checkpoint workloads. The implication is that the file sys-
tem's form must change, the application's fault characteristics must change,
or both. This section explores methods in both areas.
 
Search WWH ::




Custom Search