Processor Cores - Heterogeneous Multicore Processor Technologies for Embedded Systems

Hardware Reference

In-Depth Information

Table 3.1

Microarchitecture selections of SH-4

Selections

Other candidates

Merits

Number of issues

Dual

Scalar, triple, quad

Maintaining

high ef fi ciency

Issue order

In-order

Out-of-order

Resource duplication

Asymmetric

Duplicated (symmetric)

Important category

Transfer

Memory access,

arithmetic

Good for two-

operand ISA

Latency concealing

Zero-cycle transfer

Delayed execution,

store buffers

Internal memories

Harvard architecture

Uni fi ed cache

Simultaneous access

Branch acceleration

Delayed branch,

early-stage branch

Branch prediction,

out-of-order issue,

branch target buffer,

separated instructions

Simple, small,

compatible

the duplicated resources were not often used simultaneously, and the architecture would

not achieve high efficiency.

All the instructions were categorized to reduce a pipeline hazard by the resource

conflicts, which would not occur in symmetric architecture with the expense of the

resource duplication. Especially, a transfer instruction of a literal or register value is

important for the 16-bit fixed-length ISA, and the transfer instructions were catego-

rized as a type that could utilize both execution and load/store pipelines properly.

Further a zero-cycle transfer operation was implemented for the transfer instruc-

tions and contributes to reduce the hazard.

As for memory architecture, Harvard architecture was popular for PC/server pro-

cessors enabling simultaneous accesses to instruction and data caches, and unified

cache architecture was popular for embedded processors to reduce the hardware

cost and to utilize relatively small size cache efficiently. The SH-4 adopted the

Harvard architecture, which was necessary to avoid the memory access conflict

increased by the superscalar issue.

The SH architecture adopted a delayed branch to reduce the branch penalty

cycles. In addition, the SH-4 adopted an early-stage branch to reduce the penalty

further. The penalty cycles increased with the superscalar issue, but were not so

much as that of a superpipeline processor having deep pipeline stages, and the SH-4

did not adopt more expensive ways such as a branch target buffer (BTB), an out-of-

order issue of a branch instruction, and a branch prediction. The SH-4 kept the

backward compatibility and did not adopt a method with ISA change like a method

using plural instructions for a branch.

As the result of the selection, the SH-4 adopted an in-order dual-issue asymmet-

ric five-stage superscalar pipeline and Harvard architecture with special treatment

of transfer instructions including zero-cycle transfer method.

Heterogeneous Multicore Processor Technologies for Embedded Systems

Search WWH ::

Custom Search

Home