Hardware Reference
In-Depth Information
Action
Data Registers (SRAM)
PE
Data Registers (SRAM)
Multiplic and
Region
Multiplier
Region
Accumulator
Region
Multiplier
Loaded &
Booth
Encoding
XH/X
DF
N
1
2bit - FA
1
1
10
10
11
01
10
00
00
00
00
S
0
XH/X
DF
N
Multiplic and
Added to
Accumulator
1
2bit - FA
1
1
11
10
11
01
10
00
00
00
10
S
0
XH/X
DF
N
Multiplic and
Added to
Accumulator
1
2bit - FA
1
1
10
10
11
01
10
00
00
10
10
S
1
XH/X
DF
N
Sign-
Extension
1
2bit - FA
1
1
11
10
11
01
10
00
00
10
10
S
1
Multiplier
Loaded &
Booth
Encoding
XH/X
DF
N
1
2bit - FA
1
0
01
10
11
01
10
00
00
10
10
S
0
(be continued)
Fig. 3.68
Operation fl ow of MAC operation
Fig. 3.69
Micrograph of MX-1 core
microprograms stored in the instruction RAM in the controller. With the proposed
circuit configuration, a 16-bit fixed-point signed MAC operation costs about 100
cycles in each PE, which is 56% smaller than that of non-Booth circuit configuration.
The MAC cycle cost of 100 cycles is normalized to 0.05 cycle per one PE because
MX-1 executes 2,048 MAC operations in parallel. In this way, fast MAC operations
based on the Booth's algorithm can be realized even with the 2-bit-grained PE
con fi guration of MX-1.
Figure 3.69 shows the micrograph of the MX-1 core, and the performance of
MX-1 is summarized in Table 3.17 .
 
Search WWH ::




Custom Search