Processor Cores - Heterogeneous Multicore Processor Technologies for Embedded Systems

Hardware Reference

In-Depth Information

SH-4 was 240/170 = 1.41 times as high as that of the SH-3. As a result, the SH-4

kept the power efficiency of the cycle performance that is calculated as

1.65/1.41 = 1.17. The actual efficiencies including the process contribution were

147 MIPS/0.17 W = 865 MIPS/W for the SH-3 and 240 MIPS/0.24 W = 1,000

MIPS/W for the SH-4. Although a conventional superscalar processor was thought

to be less efficient than a scalar processor, the SH-4 was more efficient than a scalar

processor. On the other conditions, the SH-4 achieved 166 MHz at 1.8 V with

400 mW and 240 MHz at 1.95 V with 700 mW, and the corresponding efficiencies

were 300 MIPS/0.4 W = 750 MIPS/W and 432 MIPS/0.7 W = 617 MIPS/W.

3.1.3

Ef fi cient Frequency Enhancement of SH-X

The asymmetric superscalar architecture of the SH-4 achieved high performance and

efficiency. However, further parallelism would not contribute to the performance

because of the limited parallelism of a general program. On the other hand, the oper-

ating frequency would be limited by an applied process without fundamental change

of the architecture or microarchitecture. Although conventional superpipeline archi-

tecture was thought inefficient as was the conventional superscalar architecture

before the SH-4 [ 47, 48 ], an SH-X embedded processor core was developed with

superpipeline architecture to enhance the operating frequency with maintaining the

high efficiency of the SH-4.

3.1.3.1

Microarchitecture Selections

The SH-X adopted seven-stage superpipeline to maintain the efficiency among

various numbers of stages adopted to various processors up to highly superpipe-

lined 20 stages [ 48 ]. The seven-stage pipeline degraded the cycle performance

compared to the five-stage one. Therefore, appropriate methods were chosen to

enhance and recover the cycle performance with the careful trade-off judgment of

performance and efficiency. Table 3.5 summarizes the selection result of the

microarchitecture.

An out-of-order issue was the popular method used by a high-end processor in

order to enhance the cycle performance. However, it required much hardware and

was too inefficient especially for general-purpose register handling. The SH-X

adopted an in-order issue except some branches using no general-purpose register.

The branch penalty was the serious problem for the superpipeline architecture. In

addition to the method of the SH-4, the SH-X adopted a branch prediction and an

out-of-order branch issue, but did not adopt a more expensive way with a BTB and

an incompatible way with plural instructions. The branch prediction is categorized

to static and dynamic ones, and the static ones require the architecture change to

insert the static prediction result to the instruction. Therefore, the SH-X adopted a

dynamic one with a branch history table (BHT) and a global history.

Search WWH ::

Custom Search

Home