Hardware Reference
In-Depth Information
to control writing—is also its Achilles heel. Flash must use bulk erase-rewrite cycles that are
considerably slower. As a result, several PMDs, such as the Apple iPad, use a relatively small
SDRAM main memory combined with Flash, which acts as both the file system and the page
storage system to handle virtual memory.
In addition, several completely new approaches to memory are being explored. These in-
clude MRAMs, which use magnetic storage of data, and phase change RAMs (known as
PCRAM, PCME, and PRAM), which use a glass that can be changed between amorphous and
crystalline states. Both types of memories are nonvolatile and offer potentially higher densit-
ies than DRAMs. These are not new ideas; magnetoresistive memory technologies and phase
change memories have been around for decades. Either technology may become an alternat-
ive to current Flash; replacing DRAM is a much tougher task. Although the improvements
in DRAMs have slowed down, the possibility of a capacitor-free cell and other potential im-
provements make it hard to bet against DRAMs at least for the next decade.
For some years, a variety of predictions have been made about the coming memory wall
(see quote and paper cited above), which would lead to fundamental decreases in processor
performance. However, the extension of caches to multiple levels, more sophisticated reill
and prefetch schemes, greater compiler and programmer awareness of the importance of loc-
ality, and the use of parallelism to hide what latency remains have helped keep the memory
wall at bay. The introduction of out-of-order pipelines with multiple outstanding misses al-
lowed available instruction-level parallelism to hide the memory latency remaining in a cache-
based system. The introduction of multithreading and more thread-level parallelism took this
a step further by providing more parallelism and hence more latency-hiding opportunities. It
is likely that the use of instruction- and thread-level parallelism will be the primary tool to
combat whatever memory delays are encountered in modern multilevel cache systems.
One idea that periodically arises is the use of programmer-controlled scratchpad or other
high-speed memories, which we will see are used in GPUs. Such ideas have never made the
mainstream for several reasons: First, they break the memory model by introducing address
spaces with different behavior. Second, unlike compiler-based or programmer-based cache
optimizations (such as prefetching), memory transformations with scratchpads must com-
pletely handle the remapping from main memory address space to the scratchpad address
space. This makes such transformations more difficult and limited in applicability. In GPUs
(see Chapter 4 ) , where local scratchpad memories are heavily used, the burden for managing
them currently falls on the programmer.
Although one should be cautious about predicting the future of computing technology, This
tory has shown that caching is a powerful and highly extensible idea that is likely to allow us
to continue to build faster computers and ensure that the memory hierarchy can deliver the
instructions and data needed to keep such systems working well.
2.9 Historical Perspective and References
In Section L.3 (available online) we examine the history of caches, virtual memory, and virtual
machines. IBM plays a prominent role in the history of all three. References for further reading
are included.
 
Search WWH ::




Custom Search