Hardware Reference
In-Depth Information
A-drv.
Module
128-256 F/Fs
B-drv.
Clock
Gen.
C-drv.
D-drvs.
F/Fs
Clock
Control
Registers
GCKD
GCKD
GCKD
Hardware
(dynamic)
Software
(static)
Hardware (dynamic)
CCP: Control Clock Pin
GCKD: Gated Clock Driver Cell
ph1 edge trigger F/F
ph2 transparent latch
Fig. 3.20
Clock-gating method of SH-X2
3.1.5
Ef fi cient Parallelization of SH-4 FPU
In 1995, SH-3E, the first embedded processor with an on-chip floating-point unit
(FPU) was developed by Hitachi mainly for a home game console. It operated
at 66 MHz and achieved peak performance of 132 MFLOPS with a floating-point
multiply-accumulate instruction (FMAC). At that time, the on-chip FPU was popu-
lar for PC/server processors, but there was no demand of the FPU on the embedded
processors mainly because it was too expensive to integrate. However, the program-
ming of game consoles became difficult to support higher resolution and advanced
features of the 3D graphics. Especially it was difficult to avoid overflow and
underflow of fixed-point data with small dynamic range, and there was a demand to
use floating-point data. Since it was easy to implement a four-way parallel operation
with 16-bit fixed-point data, equivalent performance had to be realized to change
the data type to the floating-point format at reasonable costs.
Since an FPU was about three times as large as a fixed-point unit, and a four-way
SMID data path was four times as large as a normal one, it was too expensive to
adopt the four-way SMID FPU. Further, the FPU architecture of the SH-3E was
limited by the 16-bit fixed-length ISA. The latency of the floating-point operations
was long and required more number of registers than the fixed-point operations, but
the ISA could define only 16 registers. A popular transformation matrix of the 3D
graphics was four by four and occupied 16 registers, and no register remained for
other values. Therefore, an efficient parallelization method of FPU had to be devel-
oped with solving above issues.
3.1.5.1
Floating-Point Architecture Extension
The 16 was the limit of the number of registers directly specified by the 16-bit
fixed-length ISA. Therefore, the registers were extended to 32 as two banks of 16
registers. The two banks are front and back banks, named FR0-FR15 and
XF0-XF15, respectively, and they are switched by changing a control bit FPSCR.
FR in a floating-point status and control register (FPSCR). Most of instructions use
 
Search WWH ::




Custom Search