Processor Cores - Heterogeneous Multicore Processor Technologies for Embedded Systems - page 91

Hardware Reference

In-Depth Information

Temp. Reg.

PE

ALU

H-ch

+

Step 1

Load to

Temp. Reg.

(H-ch operation)

Entry #0

#1

#2

#3

a0

a0

b0

+

a1

a1

b1

+

a2

a2

b2

+

a3

a3

b3

V-ch

+

Entry #0

#1

#2

#3

a0

*

b0

Step 2

Shifting

Temp. Reg.

(V-ch operation)

+

a1

a0

b1

+

a2

a1

b2

+

a3

a2

b3

H-ch

+

Entry #0

#1

#2

#3

a0

*

b0+ *

Step 3

Execution

& Store

(H-ch operation)

+

a1

a0

b1+a0

+

a2

a1

b2+a1

+

a3

a2

b3+a2

Fig. 3.64

Operation fl ow of V-ch

SIMD processors are needed to equip an efficient contrivance for communicating

data among PEs because huge data and complex algorithms are needed to be pro-

cessed by using multiple data entries. V-ch in Fig. 3.61 is designed for this purpose,

and the operation flow using V-ch is shown in Fig. 3.64 . Figure 3.64 shows the way

for adding certain data with data stored in neighboring entries. In the first step, the

operands are loaded to the temporary registers of PEs. In the second step, all the

data in the temporary registers are moved by 1 entry step simply like shift registers,

which is the V-ch operation. In the third step, the other operands are added with the

data in the temporary registers and modified. The proposed simple PE network with

V-ch enables the flexible processing and is quite effective for a lot of applications,

such as convolutions, FFT, and so on.

Although some kinds of PE networks have been reported [ 57- 62 ] for massively

parallel processors, those circuits have substantial area overhead, or their operations

are too complex to be controlled by simple SIMD control signals. Considering these

backgrounds, we adopt the simple shift-register type network shown in Fig. 3.65 .

The temporary registers in PEs are utilized to form the shift registers. As shown in

the example of “+1 entry move,” the data in each entry moves to the neighboring

entry. The feature of this implementation is that entries located in the boundary such

as entry #0 and entry #2,047 can exchange the data with each other. We can realize

any movements with only 1 shift step. However, it costs a lot of cycles to realize

long-distance data moving. To reduce the cycle overhead in this long-distance case,

MX-1 supports several kinds of shift steps, such as +/− 1, 2, 4 ….256. Of course,

any movements can also be realized by the combination of this configuration

Next Page

Heterogeneous Multicore Processor Technologies for Embedded Systems

Search WWH ::

Custom Search

Home