Information Technology Reference
In-Depth Information
are fed to the circuit. Data from different sequences are completely independent.
The aim is to calculate (Eqs.
1
and
2
)
i
=0
A ∗ B
3
(1)
i
=0
C ∗ D
3
(2)
M0
N0
a0
b0
c0
d0
Multiplier
Multiplier
S0
O0
ADD
ADD
?
?
S0
S0
O0
?
A) T=0
B) T = 26 ck
M1
N1
a1
b1
c1
d1
Multiplier
Multiplier
S1
O1
ADD
ADD
S0
O0
S1
S1
O1
O0
C) T = 52 ck
D) T = 78 ck
Fig. 14.
Example of interleaving applied to a MAC unit. Two input sequences are sent
to the circuit, both send one value every 52 clock cycles, with 26 clock cycles between
data of different sequences. The left column shows the calculation of sequence A, B,
while the right one represents the calculation of sequence C, D. As can be observed,
the two sequences are executed in parallel but they do not interfere with each other.
(A)
a0
and
b0
are sent to the circuit. (B) After 26 clock cycles
c0
and
d0
are sent
to the circuit. (C) At a time correspondent to 52 clock cycles
a1
and
b1
are sent to
the circuit and they reach the adder input exactly with
S0
, the result of the previous
operation. (D) At 78 clock cycles
c1
and
d1
are sent to the circuit.
At the beginning
a0
and
b0
, the first two data of the first sequence are sent
to the circuit (Fig.
14
(A)). Just for this example, to better clarify the interleav-
ing principle, the multiplier is considered ideal without delay, so data propa-
gate directly from the general MAC inputs to the adder inputs. After a time
equal to half the loop length (26 clock cycles in this case),
c0
and
d0
are sent
to the inputs (Fig.
14
(B)). This operation is correct because there is no data
dependency between them and
a0
,
b0
. At the 52nd clock cycle,
a1
and
b1
are
then sent to the circuit and they arrive at the adder inputs together with
S0
(Fig.
14
(C)), the result of the previous operation. After other 26 clock cycles
c1
Search WWH ::
Custom Search