Hardware Reference
In-Depth Information
MPA Instruction
A1~A4
C1~C3
MPC Instruction
￿ Without FIFO
Clock
A1
A2
A3
C1
C2
C3
A4
MPC
PPU
WAIT
WAIT
WAIT
A1
A2
A3
IDLE (3cycles)
A4
￿ With FIFO
Clock
A1
A2
A3
C1
C2
C3
A4
MPC
A1
A2
A2~3
A3
A4
FIFO
MPA
A1
A2
A3
A4
“WAIT” and “IDLE” cycles are eliminated.
Fig. 3.72
FIFO operation
These instructions in FIFO are executed by the MPA in parallel with the controller
operations of C2-A4.
In addition to the above-mentioned technologies, MX-2 equips double frequency
mode which enhances the maximum operating frequency of MX-2 and its perfor-
mance. High throughput of ALU operation of the MX core is realized by the read-
modify-write (RMW) operation of SRAM [ 55 ]. The RMW operation also realizes
low-power operation because a set of read and write operations of SRAM activate
the word line only once. The RMW operation is useful for power-efficient ALU
operation. However, the operating frequency of the MX core is limited by this RMW
operation. The MX-2 core has normal frequency (NF) mode and double frequency
(DF) mode. In the NF mode, the RMW operation is executed in 1 cycle. In the DF
mode, the RMW operation is divided into two cycles, and the MX-2 can be operated
at higher frequency. This mode is used when high performance is required rather
than low power consumption. Operating cycles of 8-bit addition and 8-bit MAC are
increased to 6 cycles and 18 cycles, respectively, in the DF mode. In the image pro-
cessing applications, the operating cycle of the DF mode is increased up to 40% from
the NF mode. With the DF mode, the maximum operating frequency of the MX-2
can be enhanced almost up to double compared with the NF mode. Therefore, the
processing performance of the real application can be improved with the DF mode.
Figure 3.73 shows the performance comparisons of MX-1 and MX-2 in case
various application programs are executed. To clarify the effect of the improvement,
Case A which is the case of 4-bit PE with conventional MPC is added in this graph.
About 20-40% improvement is confirmed with only the 4-bit-grained PE. In addi-
tion to that, about 20-40% improvement is realized with the improved MPC.
Figure 3.74 shows the micrograph of the MX-2 core, and the performance of
MX-2 is summarized in the Table 3.18 .
 
Search WWH ::




Custom Search