Hardware Reference
In-Depth Information
Table 3.5
Microarchitecture selections of SH-X
Selections
Other candidates
Merits
Pipeline stages
7
5, 6, 8, 10, 15, 20
1.4 times frequency
enhancement
Branch acceleration
Out-of-order issue
BTB, branch with
plural instructions
Compatibility,
Small area
For low frequency
branch
Branch prediction
Dynamic (BHT,
global history)
Static ( fi xed direction,
hint bit in instruction)
Latency concealing
Delayed execution,
store buffers
Out-of-order issue
Simple, small
I1
I2
ID
E1
E2
E3
E4
E5
E6
Early Branch
Instruction Fetch
Branch
Instruction Decoding
FPU Instruction Decoding
Address
Execution
FPU
Data
Transfer
FPU
Arithmetic
Execution
Data
Load/Store
WB
WB
WB
WB
BR
INT
LS
FE
Fig. 3.6
Conventional seven-stage superpipeline structure
The load/store latencies were also a serious problem, and the out-of-order issue
was effective to hide the latencies, but too inefficient to adopt as mentioned above.
The SH-X adopted a delayed execution and a store buffer as more efficient methods.
The selected methods were effective to reduce the pipeline hazard caused by the
superpipeline architecture, but not effective to avoid a long-cycle stall caused by a
cache miss for an external memory access. Such a stall could be avoided by an out-
of-order architecture with large-scale buffers, but was not a serious problem for
embedded systems.
3.1.3.2
Improved Superpipeline Architecture
Figure 3.6 illustrates a conventional seven-stage superpipeline structure based on
the ISA and instruction categorization of the SH-4. The seven stages consist of first
and second instruction fetch (I1 and I2) stages and an instruction decoding (ID)
stage for all the pipelines, and first to fourth execution (E1, E2, E3, and E4) stages
for the INT, LS, and FE pipelines. The FE pipeline has nine stages with two extra
execution stages of E5 and E6.
The I1, I2, and ID stages correspond to the IF and ID stages, and the E1, E2, and
E3 stages correspond to the EX and MA stages of the SH-4. Therefore, the same pro-
cessing time is divided into 1.5 times as many stages as the SH-4. Then, the operating
 
Search WWH ::




Custom Search