Information Technology Reference
In-Depth Information
Fig. 24. WIL Systolic Array generic processing element structure.
i
Z e =
=1 d n
(3)
n
Z fo = j
n
+1 d n
(4)
=
i
k− 1
Z fb =
+1 d n
(5)
n
=
j
Z o =
d k
(6)
Input data coming from outside enter in the first block, while data coming
from the neighbor processing element can enter at any stage of the cell. Inputs
must be provided every
Z fb cycles, that is the total time of the
feedback loop, in order to match them with data coming from the feedback. To
apply interleaving the intrinsic pipelined nature of the structure can be exploited.
Thus we can improve performance and usage of the cell giving inputs every
J
Z loop =
Z fo +
max{d n }
J
=
.Every
cycles a new operation can be started, and in this way
M
different operations can be interleaved,
M
=
Z loop /J
(integer division). After
Z loop cycles, the second set of inputs is fed. When
Z loop is not a perfect multiple
of
J
, the remainder of the division, called
R
, must be taken into account. After
M
successive inputs have been fed, the following one must provided with a delay
of
J
+
R
, so to have synchronization with the value coming from the loop.
R
represents a number of “stalls” that must be inserted between the one set of
M
inputs and the following set.
Applying pipeline interleaving it is possible to evaluate
M
different operations
in parallel, having an increase in performance of
M
. If the number of stalls is high,
Search WWH ::




Custom Search