Systolic Arrays - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

3.2

Localized and Single Assignment Algorithm Formulation

As demonstrated in Sect. 2 , a systolic array is a parallel, distributed computing

platform where each PE will execute identical operations. By connecting PEs with

specific configurations, and providing input data at right timing, a systolic array will

be able to perform data intensive computations in a rhythmic, synchronous fashion.

Therefore, to implement a given algorithm on a systolic array, its formulation

may need to be adjusted. Specifically, since computation takes place at physically

separated PEs, data movement in a systolic algorithm must be explicitly specified.

Moreover, unnecessary algorithm formulation restrictions that may impede the

exploitation of inherent parallelism must be removed.

A closer examination of Fig. 5 reveals two potential problems according to above

arguments: (i) The variables y [ n ], h [ k ], x [ n

−

k ] are one-dimensional arrays while

each index vector i

( n, k ) resides in a two-dimensional space. (ii) The memory

address locations y [ n ] will be repeatedly assigned with new values during each k

loop K times before the final result is evaluated.

Having one-dimensional variable arrays in a two dimensional index space

implies that the same input data will be needed when executing the loop body

H( i ) at different iterations (index points). In a systolic array, it is likely that H( i )

and H( j ), i

j may be executed at different PEs. As such, how these variables may

be distributed to different index points where they are needed should be explicitly

specified in the algorithm. Furthermore, a design philosophy that dominates the

development of systolic array is to discourage on-chip global inter-connect due to

many potential drawbacks. Hence, the data movement would be restricted to local

communication. Namely passing the data from one PE to one or more of its nearest

neighboring PEs in a systolic array. This restriction may be imposed by limiting the

propagation of such a global variable from one index point to its nearest neighboring

index points. For this purpose, we make the following modification of Fig. 5 :

[

] →

h 1

[

]

such that h 1

[

] ,

h 1

[

h 1

[

−

]

] .

[

] →

x 1

[

]

such that x 1

[

] ,

x 1

[

x 1

[

−

Note that the equations for h 1and x 1 are chosen based on the fact that h [ k ] will be

made available for the entire ranges of index n ,and x [ n

−

k ] will be made available

to all ( n ,k ) such that n −

k =

k . An algorithm with all its variables passing

from one iteration (index point) to its neighboring index point is called a (variable)

localized algorithm.

The repeated assignment of different intermediate results of y [ n ] into the same

memory location will cause an unwanted output dependence relation in the algo-

rithm formulation. Output dependence is a type of false data dependence that would

impede potential parallel execution of a given algorithm. The output dependence

can be removed if the algorithm is formulated to obey a single assignment rule.

That is, every memory address (variable name) will be assigned to a new value

only once during the execution of an algorithm. To remedy, one would create new

−

Signal Processing Systems

Search WWH ::

Custom Search

Home