DSP Systems Using Three-Dimensional Integration Technology - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

processor die, which can directly increase the system parallelism and potentially

improve the overall computing system performance without increasing the chip

footprint. In this context, the 3D DRAM has a heterogeneous structure and covers

two levels of the entire memory hierarchy. Because L2 cache demands very short

access latency, the 3D DRAM L2 cache should be particularly customized in

order to achieve a comparable access latency as its on-chip SRAM counterpart.

5.2

3D DRAM L2 Cache

The 3D VLIW architecture configuration as shown in Fig. 3 b migrates the on-

chip L2 cache into the 3D DRAM domain. Since L2 cache access latency plays

a critical role in determining the overall computing system performance, one may

intuitively argue that, compared with on-chip SRAM L2 cache, 3D DRAM L2

cache may suffer from much longer access latency and hence result in significant

performance degradation. In this section, we show that this intuitive argument may

not necessarily hold true. In particular, as we increase the L2 cache capacity and the

number of DRAM dies, the 3D DRAM L2 cache may have a latency comparable or

even shorter than an SRAM L2 cache.

Commercial DRAM is typically much slower than SRAM mainly because, being

commodity, DRAM has been always optimized for density and cost rather than

speed. The speed of DRAM can be greatly improved by two approaches at the cost

of density and cost, including:

1. We can reduce the size of each individual DRAM sub-array to reduce the memory

access latency at the penalty of storage density. With shorter lengths of word-

lines and bit-lines, a smaller DRAM sub-array can directly lead to reduced access

latency because of the reduced load of the peripheral circuits.

2. We can adopt the multiple threshold voltage (multi-V th ) technique that has been

widely used in logic circuit design [ 30 ] , i.e., we still use high-V th transistors in

DRAM cells to maintain a sufficiently low DRAM cell leakage current, while

using low-V th transistors in peripheral circuits and H-tree buffers to reduce the

latency. Such multi-V th design is not typically used in commodity DRAM since

it will increase leakage power consumption of peripheral circuits and, more

importantly, will complicate the DRAM fabrication process and hence incur a

higher cost.

Moreover, as we increase the L2 cache capacity, the global routing will play a

bigger role in determining the overall L2 cache access latency. By using the above

presented 3D DRAM design strategy, we can directly reduce the latency incurred

by global routing, which will further contribute to reducing 3D DRAM L2 cache

access latency compared with 2D SRAM L2 cache.

To evaluate the above arguments, Fig. 4 shows the comparison of access latency

of 2D SRAM, single-V th 2D DRAM, and multi-V th 2D DRAM, under different L2

cache capacities, at 65 nm node. The results show that, as we increase the capacity of

Search WWH ::

Custom Search

Home