Memory Hierarchy Design - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

to control writing—is also its Achilles heel. Flash must use bulk erase-rewrite cycles that are

considerably slower. As a result, several PMDs, such as the Apple iPad, use a relatively small

SDRAM main memory combined with Flash, which acts as both the file system and the page

storage system to handle virtual memory.

In addition, several completely new approaches to memory are being explored. These in-

clude MRAMs, which use magnetic storage of data, and phase change RAMs (known as

PCRAM, PCME, and PRAM), which use a glass that can be changed between amorphous and

crystalline states. Both types of memories are nonvolatile and offer potentially higher densit-

ies than DRAMs. These are not new ideas; magnetoresistive memory technologies and phase

change memories have been around for decades. Either technology may become an alternat-

ive to current Flash; replacing DRAM is a much tougher task. Although the improvements

in DRAMs have slowed down, the possibility of a capacitor-free cell and other potential im-

provements make it hard to bet against DRAMs at least for the next decade.

For some years, a variety of predictions have been made about the coming memory wall

(see quote and paper cited above), which would lead to fundamental decreases in processor

performance. However, the extension of caches to multiple levels, more sophisticated reill

and prefetch schemes, greater compiler and programmer awareness of the importance of loc-

ality, and the use of parallelism to hide what latency remains have helped keep the memory

wall at bay. The introduction of out-of-order pipelines with multiple outstanding misses al-

lowed available instruction-level parallelism to hide the memory latency remaining in a cache-

based system. The introduction of multithreading and more thread-level parallelism took this

a step further by providing more parallelism and hence more latency-hiding opportunities. It

is likely that the use of instruction- and thread-level parallelism will be the primary tool to

combat whatever memory delays are encountered in modern multilevel cache systems.

One idea that periodically arises is the use of programmer-controlled scratchpad or other

high-speed memories, which we will see are used in GPUs. Such ideas have never made the

mainstream for several reasons: First, they break the memory model by introducing address

spaces with different behavior. Second, unlike compiler-based or programmer-based cache

optimizations (such as prefetching), memory transformations with scratchpads must com-

pletely handle the remapping from main memory address space to the scratchpad address

space. This makes such transformations more difficult and limited in applicability. In GPUs

(see Chapter 4 ) , where local scratchpad memories are heavily used, the burden for managing

them currently falls on the programmer.

Although one should be cautious about predicting the future of computing technology, This

tory has shown that caching is a powerful and highly extensible idea that is likely to allow us

to continue to build faster computers and ensure that the memory hierarchy can deliver the

instructions and data needed to keep such systems working well.

2.9 Historical Perspective and References

In Section L.3 (available online) we examine the history of caches, virtual memory, and virtual

machines. IBM plays a prominent role in the history of all three. References for further reading

are included.

Search WWH ::

Custom Search

Home