Digital Signal Processing Reference
In-Depth Information
a
b
c
Fig. 1 Common configurations of systolic architecture: ( a ) linear array, ( b ) rectangular array, and
( c ) hexagonal array
Several key architectural concerns impacted on the development of systolic
architecture [ 13 ] :
(a) Simple and regular design : In order to reduce design complexity, design cost,
and to improve testability, fault-tolerance, it is argued that VLSI architecture
should consist of simple modules (cores, PEs, etc.) organized in regular arrays.
(b) Concurrency and communication : Concurrent computing is essential to achieve
high performance while conserving power. On-chip communication must be
constrained to be local and regular to minimize excessive overhead due to long
wire, long delay and high power consumption.
(c) Balanced on-chip computation rate and on/off chip data input/output rate :
Moving data on/off chip remains to be a communication bottleneck of modern
VLSI chips. A sensible architecture must balance the demand of on/off chip
data I/O to maximize the utilization of the available computing resources.
Systolic array is proposed to implement application specific computing systems.
Toward this goal, one must map the computing algorithm to a systolic array.
This requirement stimulated two complementary research directions that have
seen numerous significant and fruitful research results. The first research direction
is to reformulate existing computing algorithms, or develop novel computing
algorithms that can be mapped onto a systolic architecture to enjoy the benefit of
systolic computing. The second research direction is to develop a systematic design
methodology that would automate the process of algorithm mapping. In Sect. 2 of
this chapter, we will provide a brief overview of these systolic algorithms that have
been proposed. In Sect. 3 , the formal design methodologies developed for automated
systolic array mappings will be reviewed.
Systolic array computing was developed based on a globally synchronized, fine-
grained, pipelined timing model. It requires a global clock distribution network
free of clock skew to distribute the clock signal over the entire systolic array.
Recognizing the technical challenge of developing large scale clock distribution
network, Kung et al. [ 14 - 16 ] proposed a self-timed, data flow based wavefront array
 
 
Search WWH ::




Custom Search