Digital Signal Processing Reference
In-Depth Information
processor architecture that promises to alleviate the stringent timing constraint
imposed by the global clock synchronization requirement. In Sect. 4 , the wavefront
array architecture and its related design methodology will be discussed.
These architectural features of systolic array have motivated numerous devel-
opments of research and commercial computing architectures. Notable examples
include the WARP and iWARP project at CMU [ 1 , 3 , 7 , 10 ] ; Transputer™ of
INMOS [ 8 , 19 , 22 , 26 ] ; and TMS 32040 DSP processor of Texas Instruments [ 23 ] .
In Sect. 5 of this chapter, brief reviews of these systolic-array motivated computing
architectures will be surveyed.
While the notion of systolic array was first proposed three decades ago, its
impacts can be felt vividly today. Modern applications of the concept of systolic
array can be found in field programmable gate array (FPGA) chip architectures,
network-on-chip (NoC) mesh array multi-core architecture. Computation intensive
special purpose architecture such as discrete cosine transform and block motion
estimation algorithms in video coding standards, as well as the QR factorization for
least square filtering in wireless communication standards have been incorporated
in embedded chip designs. These latest real world applications of systolic array
architecture will be discussed in Sect. 6 .
2
Systolic Array Computing Algorithms
A systolic array exhibits characteristics of parallelism (pipelining), regularity,
and local communication. A large number of signal processing algorithms, and
numerical linear algebra algorithms can be implemented using systolic arrays.
2.1
Convolution Systolic Array
For example, consider a convolution of two sequences
{
x [ n ]
}
and
{
h [ n ]
}
:
K
1
k = 0 h [ k ] x [ n k ] ... 0 n N 1 .
y
[
n
]=
(1)
4). In
Fig. 2 a , the block diagram of the systolic array and the pattern of data movement are
depicted. The block diagram of an individual processing element (PE) is illustrated
in Fig. 2 b where a shaded rectangle represents a buffer (delay element) that can be
implemented with a register. The output y [ n ] begins its evaluation at the upper left
input with initial value 0. When it enters into each PE, the multiply-and-accumulate
(MAC) operation
A systolic array realization of this algorithm can be shown in Fig. 2 ( K
=
 
 
Search WWH ::




Custom Search