Hardware Reference
In-Depth Information
Temp. Reg.
PE
ALU
H-ch
+
Step 1
Load to
Temp. Reg.
(H-ch operation)
Entry #0
#1
#2
#3
a0
a0
b0
+
a1
a1
b1
+
a2
a2
b2
+
a3
a3
b3
V-ch
+
Entry #0
#1
#2
#3
a0
*
b0
Step 2
Shifting
Temp. Reg.
(V-ch operation)
+
a1
a0
b1
+
a2
a1
b2
+
a3
a2
b3
H-ch
+
Entry #0
#1
#2
#3
a0
*
b0+ *
Step 3
Execution
& Store
(H-ch operation)
+
a1
a0
b1+a0
+
a2
a1
b2+a1
+
a3
a2
b3+a2
Fig. 3.64
Operation fl ow of V-ch
SIMD processors are needed to equip an efficient contrivance for communicating
data among PEs because huge data and complex algorithms are needed to be pro-
cessed by using multiple data entries. V-ch in Fig. 3.61 is designed for this purpose,
and the operation flow using V-ch is shown in Fig. 3.64 . Figure 3.64 shows the way
for adding certain data with data stored in neighboring entries. In the first step, the
operands are loaded to the temporary registers of PEs. In the second step, all the
data in the temporary registers are moved by 1 entry step simply like shift registers,
which is the V-ch operation. In the third step, the other operands are added with the
data in the temporary registers and modified. The proposed simple PE network with
V-ch enables the flexible processing and is quite effective for a lot of applications,
such as convolutions, FFT, and so on.
Although some kinds of PE networks have been reported [ 57- 62 ] for massively
parallel processors, those circuits have substantial area overhead, or their operations
are too complex to be controlled by simple SIMD control signals. Considering these
backgrounds, we adopt the simple shift-register type network shown in Fig. 3.65 .
The temporary registers in PEs are utilized to form the shift registers. As shown in
the example of “+1 entry move,” the data in each entry moves to the neighboring
entry. The feature of this implementation is that entries located in the boundary such
as entry #0 and entry #2,047 can exchange the data with each other. We can realize
any movements with only 1 shift step. However, it costs a lot of cycles to realize
long-distance data moving. To reduce the cycle overhead in this long-distance case,
MX-1 supports several kinds of shift steps, such as +/− 1, 2, 4 ….256. Of course,
any movements can also be realized by the combination of this configuration
Search WWH ::




Custom Search