Digital Signal Processing Reference
In-Depth Information
a
b
Fig. 3 Illustration of ( a ) straightforward 3D DRAM stacking, and ( b ) migrating on-chip L2 cache
into 3D DRAM domain
explicitly exploit parallelism, which is different from RISC and CISC that rely on
hardware to discover parallelism on-the-fly. The compiler schedules the operations
based on expected architectural latencies, including the instruction execution time
and the memory access time. Because of the data intensive nature of most signal
processing applications, VLIW digital signal processors demand careful design and
optimization of cache and memory hierarchy, which has been extensively studied
(e.g., see [ 2 , 20 , 22 , 23 , 77 ] ). It is very intuitive that VLIW digital signal processor
and high-capacity DRAM integration enabled by 3D integration technologies can
further improve the memory hierarchy performance and hence improve the overall
system performance.
5.1
3D DRAM Stacking in VLIW Digital Signal Processors
In current design practice, a VLIW digital signal processor typically has a shared
on-chip L2 cache and connects with an off-chip DRAM, where each L2 cache miss
may result in significant penalty due to the long latency of off-chip data access. We
have two obvious options to explore the design of VLIW digital signal processors
with 3D DRAM stacking:
1. The most straightforward design option is to directly stack a VLIW digital
signal processor die with 3D DRAM, as illustrated in Fig. 3 a . This can be
considered as moving the off-chip DRAM into the processor chip package.
Clearly, such a straightforward design option can directly reduce the on-chip L2
cache miss penalty due to much less processor-DRAM access latency enabled by
the 3D integration. Moreover, since 3D integration can enable a massive inter-
die interconnect bandwidth through TSVs, the cache-memory communication
bandwidth can be largely increased, which can further improve the overall system
performance.
2. Beyond the above straightforward design option, we further study the potential of
migrating the shared on-chip L2 cache into the 3D DRAM domain. As illustrated
in Fig. 3 b , this makes it possible to put more clusters on the VLIW digital signal
 
Search WWH ::




Custom Search