Digital Signal Processing Reference
In-Depth Information
efficiently executing common mathematical computations used in signal processing
[ 45 ] . Being capable of well exploiting the instruction-level parallelism (ILP)
abundant in many signal processing algorithms, VLIW (very long instruction word)
architecture is being widely used to build modern programmable digital signal
processors, although VLIW architecture tends to require more program memory.
Meanwhile, enabled by continuous technology scaling, programmable digital signal
processors are quickly adopting multi-core architecture in order to better handle
various high-end communication and multimedia applications. As a result, similar
to the memory wall problem in general purpose processors, programmable digital
signal processors also experience an increasingly significant gap between on-chip
processing speed and off-chip memory access speed/bandwidth. Hence, on-chip
cache memory is also being used in programmable digital signal processors.
However, because programmable digital signal processors are typically used in
embedded systems with real-time constraints but the direct use of cache may
introduce uncertainty of program execution time, design and use of on-chip cache
memory in programmable digital signal processors can be very sophisticated and
hence have been well studied (e.g., see [ 2 , 20 , 22 , 23 , 59 , 77 ] ).
It is very intuitive that 3D logic-memory integration provides many opportunities
to greatly improve programmable digital signal processor performance, which
include but certainly are not limited to:
￿
The 3D stacked memory can serve for the entire cache memory hierarchy with
much reduced memory access latency. This can directly reduce the memory
system design complexity and energy consumption induced by data movement
among cache memory hierarchy.
￿
With the high-capacity memory storage, we could possibly design a cache
memory system with much improved program execution time predictability. One
possible solution is to use a coarse-grained caching of the program code, e.g.,
cache an entire sub-routine instead of fine-grained cache line as in current design
practice, which can largely reduce the run-time cache misses and hence improve
the execution time predictability by leveraging the large cache capacity enabled
by 3D memory stacking.
￿
With sufficient 3D stacked memory storage capacity, signal processing data-flow
diagram synthesis and scheduling may be potentially carried out in a much more
efficient way. Modern data-flow diagram synthesis and scheduling techniques are
typically memory constrained and inevitably involve certain design trade-off. 3D
logic-memory integration may provide a unique opportunity to develop much
better signal processing data-flow diagram synthesis and scheduling solutions.
In sharp contrast to programmable processors, application-specific signal pro-
cessing IC design typically involves very close interaction between specific signal
processing algorithm design and VLSI architecture design/optimization, which
enables a much greater design flexibility and trade-off spectrum. Under the 3D
logic-memory integration framework, the high memory storage capacity with
massive logic-memory interconnect bandwidth naturally leads to much more space
to explore signal processing algorithm and logic architecture design. Meanwhile,
Search WWH ::




Custom Search