Hardware Reference
In-Depth Information
SH-4 was 240/170 = 1.41 times as high as that of the SH-3. As a result, the SH-4
kept the power efficiency of the cycle performance that is calculated as
1.65/1.41 = 1.17. The actual efficiencies including the process contribution were
147 MIPS/0.17 W = 865 MIPS/W for the SH-3 and 240 MIPS/0.24 W = 1,000
MIPS/W for the SH-4. Although a conventional superscalar processor was thought
to be less efficient than a scalar processor, the SH-4 was more efficient than a scalar
processor. On the other conditions, the SH-4 achieved 166 MHz at 1.8 V with
400 mW and 240 MHz at 1.95 V with 700 mW, and the corresponding efficiencies
were 300 MIPS/0.4 W = 750 MIPS/W and 432 MIPS/0.7 W = 617 MIPS/W.
3.1.3
Ef fi cient Frequency Enhancement of SH-X
The asymmetric superscalar architecture of the SH-4 achieved high performance and
efficiency. However, further parallelism would not contribute to the performance
because of the limited parallelism of a general program. On the other hand, the oper-
ating frequency would be limited by an applied process without fundamental change
of the architecture or microarchitecture. Although conventional superpipeline archi-
tecture was thought inefficient as was the conventional superscalar architecture
before the SH-4 [ 47, 48 ], an SH-X embedded processor core was developed with
superpipeline architecture to enhance the operating frequency with maintaining the
high efficiency of the SH-4.
3.1.3.1
Microarchitecture Selections
The SH-X adopted seven-stage superpipeline to maintain the efficiency among
various numbers of stages adopted to various processors up to highly superpipe-
lined 20 stages [ 48 ]. The seven-stage pipeline degraded the cycle performance
compared to the five-stage one. Therefore, appropriate methods were chosen to
enhance and recover the cycle performance with the careful trade-off judgment of
performance and efficiency. Table 3.5 summarizes the selection result of the
microarchitecture.
An out-of-order issue was the popular method used by a high-end processor in
order to enhance the cycle performance. However, it required much hardware and
was too inefficient especially for general-purpose register handling. The SH-X
adopted an in-order issue except some branches using no general-purpose register.
The branch penalty was the serious problem for the superpipeline architecture. In
addition to the method of the SH-4, the SH-X adopted a branch prediction and an
out-of-order branch issue, but did not adopt a more expensive way with a BTB and
an incompatible way with plural instructions. The branch prediction is categorized
to static and dynamic ones, and the static ones require the architecture change to
insert the static prediction result to the instruction. Therefore, the SH-X adopted a
dynamic one with a branch history table (BHT) and a global history.
Search WWH ::




Custom Search