Systolic Arrays - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

processor architecture that promises to alleviate the stringent timing constraint

imposed by the global clock synchronization requirement. In Sect. 4 , the wavefront

array architecture and its related design methodology will be discussed.

These architectural features of systolic array have motivated numerous devel-

opments of research and commercial computing architectures. Notable examples

include the WARP and iWARP project at CMU [ 1 , 3 , 7 , 10 ] ; Transputer™ of

INMOS [ 8 , 19 , 22 , 26 ] ; and TMS 32040 DSP processor of Texas Instruments [ 23 ] .

In Sect. 5 of this chapter, brief reviews of these systolic-array motivated computing

architectures will be surveyed.

While the notion of systolic array was first proposed three decades ago, its

impacts can be felt vividly today. Modern applications of the concept of systolic

array can be found in field programmable gate array (FPGA) chip architectures,

network-on-chip (NoC) mesh array multi-core architecture. Computation intensive

special purpose architecture such as discrete cosine transform and block motion

estimation algorithms in video coding standards, as well as the QR factorization for

least square filtering in wireless communication standards have been incorporated

in embedded chip designs. These latest real world applications of systolic array

architecture will be discussed in Sect. 6 .

2

Systolic Array Computing Algorithms

A systolic array exhibits characteristics of parallelism (pipelining), regularity,

and local communication. A large number of signal processing algorithms, and

numerical linear algebra algorithms can be implemented using systolic arrays.

2.1

Convolution Systolic Array

For example, consider a convolution of two sequences

{

x [ n ]

}

and

{

h [ n ]

}

:

K

1

k = 0 h [ k ] x [ n − k ] ... 0 ≤ n ≤ N − 1 .

−

y

[

n

]=

(1)

4). In

Fig. 2 a , the block diagram of the systolic array and the pattern of data movement are

depicted. The block diagram of an individual processing element (PE) is illustrated

in Fig. 2 b where a shaded rectangle represents a buffer (delay element) that can be

implemented with a register. The output y [ n ] begins its evaluation at the upper left

input with initial value 0. When it enters into each PE, the multiply-and-accumulate

(MAC) operation

A systolic array realization of this algorithm can be shown in Fig. 2 ( K

=

Signal Processing Systems

Search WWH ::

Custom Search

Home