Hardware Reference
In-Depth Information
11 cycles/polygon
36M
30
with Special Inst.
x2.4
19 cycles
20
21M
x2.7
15M
without Special Inst.
10
7.7M
0
Plus Intensity Calculation
Coordinate & Perspective
Transformations
Fig. 3.38
Benchmark performance of SH-X at 400 MHz
not used. Similarly, when intensity is also calculated, the execution cycles are 19
and 52 with and without special instructions, respectively, and 63% shorter using
special instructions compared to not using them.
Figure 3.38 shows the 3D graphics benchmark performance at 400 MHz, accord-
ing to the cycles shown in Fig. 3.37 . Without special instructions, the coordinate and
perspective transformation performance is 15-M polygons/s. With special instruc-
tions, the performance is accelerated 2.4 times to 36-M polygons/s. Similarly, with
intensity calculation, but without any special instructions, 7.7-M polygons/s is
achieved. Using special instructions, the performance is accelerated 2.7 times to
21-M polygons/s.
It is useful to compare the SH-3E, SH-4, and SH-X performance with the same
benchmark. Figure 3.39 shows the resource-occupying cycles of the SH-3E, SH-4,
and SH-X. The main difference between the SH-4 and the SH-X is the newly defined
FSRRA and FSCA, and the effect of the FSRRA is clearly shown in the figure.
The conventional SH-3E architecture took 68 cycles for coordinate and perspec-
tive transformations, 74 cycles for intensity calculation, and totally 142 cycles.
Applying superscalar architecture and SRT method for FDIV/FSQRT with keeping
the SH-3E ISA, they became 39, 42, and 81 cycles, respectively. The SH-4 architec-
ture having the FIPR/FTRV and the out-of-order FDIV/FSQRT made them 20, 19,
and 39 cycles, respectively. The performance was good, but only the FDIV/FSQRT
resource was busy in this case. Further, applying the superpipeline architecture with
keeping the SH-4 ISA, they became 26, 26, and 52 cycles, respectively. Although
the operating frequency grew higher by the superpipeline architecture, the cycle
performance degradation was serious, and almost no performance gain was achieved.
In the SH-X ISA case with the FSRRA, they became 11, 8, and 19 cycles, respec-
tively. Clearly, the FSRRA solved the long pitch problem of the FDIV/FSQRT.
Since we emphasized the importance of the efficiency, we evaluated the area and
power efficiencies. Figure 3.40 shows the area efficiencies of the SH-3E, SH-4, and
SH-X. The upper half shows architectural performance, relative area, and architectural
 
Search WWH ::




Custom Search