Digital Signal Processing Reference
In-Depth Information
3.2
Localized and Single Assignment Algorithm Formulation
As demonstrated in Sect. 2 , a systolic array is a parallel, distributed computing
platform where each PE will execute identical operations. By connecting PEs with
specific configurations, and providing input data at right timing, a systolic array will
be able to perform data intensive computations in a rhythmic, synchronous fashion.
Therefore, to implement a given algorithm on a systolic array, its formulation
may need to be adjusted. Specifically, since computation takes place at physically
separated PEs, data movement in a systolic algorithm must be explicitly specified.
Moreover, unnecessary algorithm formulation restrictions that may impede the
exploitation of inherent parallelism must be removed.
A closer examination of Fig. 5 reveals two potential problems according to above
arguments: (i) The variables y [ n ], h [ k ], x [ n
k ] are one-dimensional arrays while
each index vector i
( n, k ) resides in a two-dimensional space. (ii) The memory
address locations y [ n ] will be repeatedly assigned with new values during each k
loop K times before the final result is evaluated.
Having one-dimensional variable arrays in a two dimensional index space
implies that the same input data will be needed when executing the loop body
H( i ) at different iterations (index points). In a systolic array, it is likely that H( i )
and H( j ), i
=
j may be executed at different PEs. As such, how these variables may
be distributed to different index points where they are needed should be explicitly
specified in the algorithm. Furthermore, a design philosophy that dominates the
development of systolic array is to discourage on-chip global inter-connect due to
many potential drawbacks. Hence, the data movement would be restricted to local
communication. Namely passing the data from one PE to one or more of its nearest
neighboring PEs in a systolic array. This restriction may be imposed by limiting the
propagation of such a global variable from one index point to its nearest neighboring
index points. For this purpose, we make the following modification of Fig. 5 :
=
h
[
k
]
h 1
[
n
,
k
]
such that h 1
[
0
,
k
]=
h
[
k
] ,
h 1
[
n
,
k
]=
h 1
[
n
1
,
k
]
] .
x
[
n
]
x 1
[
n
,
k
]
such that x 1
[
n
,
0
]=
x
[
n
] ,
x 1
[
n
,
k
]=
x 1
[
n
1
,
k
1
Note that the equations for h 1and x 1 are chosen based on the fact that h [ k ] will be
made available for the entire ranges of index n ,and x [ n
k ] will be made available
to all ( n ,k ) such that n
k =
k . An algorithm with all its variables passing
from one iteration (index point) to its neighboring index point is called a (variable)
localized algorithm.
The repeated assignment of different intermediate results of y [ n ] into the same
memory location will cause an unwanted output dependence relation in the algo-
rithm formulation. Output dependence is a type of false data dependence that would
impede potential parallel execution of a given algorithm. The output dependence
can be removed if the algorithm is formulated to obey a single assignment rule.
That is, every memory address (variable name) will be assigned to a new value
only once during the execution of an algorithm. To remedy, one would create new
n
Search WWH ::




Custom Search