Hardware Reference
In-Depth Information
40 cycles/polygon
5.0M
@200 MHz
5
Scalar
Superscalar
x1.6
4
x4.2
64
3
x2.0
3.1M
83
x1.8
2
2.4M
150
x1.7
166
1
287
1.3M
1.2M
0.7M
0
1) Conventional
2) Pair Load/Store, etc.
3) Vector Inst., etc.
Fig. 3.30
Benchmark performance of SH-4 at 200 MHz
register extension and the extended instructions of the FIPR and FTRV. The corre-
sponding scalar performances would be 0.7, 1.3, and 3.1-M polygons/s at 200 MHz
for 287, 150, and 64 cycles, respectively, and the superscalar performances were
about 70% higher than the scalar ones, which was 30% for the Dhrystone benchmark.
This showed the superscalar architecture was more effective for multimedia appli-
cations than for general integer applications. Since the SH-3E was a scalar proces-
sor without the SH-4's enhancement, it took 287 cycles as the slowest case of the
above performance evaluations. Therefore, the SH-4 achieved 287/40 = 7.2 times as
high cycle performance as the SH-3E for the media processing like a 3D graphics.
The SH-4 achieved the excellent media processing efficiency. Its cycle perfor-
mance and frequency were 7.2 and 1.5 times as high as those of the SH-3E in the
same process. Therefore, the media performance in the same process was
7.2 × 1.5 = 10.8 times high. The FPU area of the SH-3E was estimated to be 3 mm 2
and that of the SH-4 was 8 mm 2 in a 0.25-mm process. Then the SH-4 was 8/3 = 2.7
times as large as the SH-3E. As a result, the SH-4 achieved 10.8/2.7 = 4.0 times as
high area efficiency as the SH-3E for the media processing.
The SH-3E consumed similar power for both Dhrystone and the 3D benchmark.
On the other hand, the SH-4 consumed 2.2 times as much power for the 3D bench-
mark as the Dhrystone. As described in Sect. 3.1.2.7 , the power consumptions of the
SH-3 and SH-4 ported to a 0.18-mm process were 170 and 240 mW at 133 MHz and
1.5 V power supply for the Dhrystone. Therefore, the power of the SH-4 was
240 × 2.2/170 = 3.3 times as high as that of the SH-3. The corresponding performance
ratio is 7.2 times because they run at the same frequency after the porting. As a result,
the SH-4 achieved 7.2/3.3 = 2.18 times as high power efficiency as the SH-3E.
The actual efficiencies including the process contribution are 60 MHz/
287 = 0.21-M polygons/s/0.6 W = 0.35-M polygons/s/W for the SH-3E and
5.0-M polygons/s/2 W = 2.5-M polygons/s/W for the SH-4.
3.1.6
Ef fi cient Frequency Enhancement of SH-X FPU
The floating-point architecture and microarchitecture extension of the SH-4
achieved high multimedia performance and efficiency as described in Sect. 3.1.5 .
This was mainly from the parallelization by the vector instructions of FIPR and
 
Search WWH ::




Custom Search