Digital Signal Processing Reference
In-Depth Information
200 MHz
Number of DRAM dies
6
58.5 mm
2
Die size
Parameter of one cluster
ALU number: 4; register file: 2 kB multi-ported; area
size: 2.77 mm
2
; technology: 90-nm
2D SRAM instruction cache (per cluster)
4 kB, 2 way, 16 byte blocks; access latency:1.26 ns
2D SRAM data cache (per cluster)
4 kB, 2 way, 16 byte blocks; access latency: 1.26 ns
Shared 2D SRAM L2 cache
256 kB, 2 way, 32 byte blocks; access latency:
2.43 ns; area: 5.69 mm
2
; technology: 90-nm
Shared 3D DRAM L2 cache
512 kB, 2 way, 32 byte blocks; access latency:
4.12 ns
Shared main memory
1 GB, 4 kB page size; access latency: 22.5 ns;
technology: 65-nm
5.3
Performance Evaluation
To evaluate the above presented two 3D VLIW digital signal processor architecture
out simulations over a wide range of signal processing benchmarks. Trimaran is an
integrated compilation and performance monitoring infrastructure and covers HPL
to strengthen the memory system simulation capability of Trimaran. For the 3D
VLIW processor and DRAM integration, we assume that the VLIW processor and
3D DRAM are designed using 90- and 65-nm technologies, respectively. For the
VLIW processor, each cluster contains 4 ALUs, a 2 kB multi-ported register file,
a private 4 kB instruction cache and a 4 kB data cache. All the clusters share one
256 kB L2 cache.
mentation of a VLIW digital signal processor at 0.13-
m technology node similar
ALUs and a 16 kB single-ported register file, occupies about 3.2 mm
2
, and the entire
VLIW digital signal processor contains 16 clusters and occupies 155 mm
2
(note
that besides the 16 clusters, it contains two MIPS CPUs and a few other peripheral
circuits). Accordingly, we estimate that one cluster with 4 ALUs and a 2 kB multi-
ported register file is about 2.77 mm
2
at 90-nm technology node, the 256 kB L2
cache is 5.69 mm
2
, and the entire VLIW processor die size is 58.5 mm
2
. Therefore,
when we move the 256 kB L2 cache into 3D DRAM domain, we can re-allocate
the silicon area to two additional clusters. We have that 58.5 mm
2
die size with
six stacked DRAM dies can achieve a total storage capacity of 1 GB with 128-bits
output width at 65-nm node. Table
2
lists all the basic configuration parameters used
in the simulations.
We use the conventional system architecture with off-chip DRAM as a baseline
configuration, where the number of clusters is 2 and the L2 cache size is 256 KB.
μ