Hardware Reference
In-Depth Information
FADD . S FR0, FR1
FMOV . S FR1,@R0
FMOV . S @R1, FR1
Add:
Store:
Load:
E1
E2
E3
E4
E5
E6
E7
E1
E2
E3
E4
E5
E2
E3
E4
E5
E1
1 cycle
Early register release
Add:
FADD. S FR0, FR1
E1
E2
E3
E4
E5
E6
E7
5 cycles
FMOV. S FR2,@R0
Store:
Copy:
E1
E2
E3
E4
E5
E1
E2
E3
E4
E5
FMOV FR1, FR2
Late register allocation
Fig. 3.9
Example of fl exible forwarding
Compare
I1
I2
ID
E1 E2 E3 E4
Branch
Delay Slot
I1
I2
IQ
ID
ID
ID
E1 E2 E3 E4
I1
I2
ID
ID
ID
E1 E2 E3 E4
ID
ID
2 cycles
Empty
Issue Slots
ID
ID
Target
I1
I2
ID
E1 E2 E3 E4
2 cycles
Fig. 3.10
Branch execution sequence of superpipeline architecture
3.1.3.3
Branch Prediction and Out-of-Order Branch Issue
Figure 3.10 illustrates branch performance degradation of superpipeline architecture
with a program sequence consisting of compare, conditional-branch, delay-slot, and
branch-target instructions. The architecture was assumed to be the same superpipe-
line architecture as that of the SH-X except branch architecture that was the same
architecture as that of the SH-4.
The conditional-branch and delay-slot instructions are issued three cycles after
the compare instruction issue, and the branch-target instruction is issued three cycles
after the branch issue. The compare operation starts at the E2 stage by the delayed
execution, and the result is available at the middle of the E3 stage. Then the condi-
tional-branch instruction checks the result at the latter half of the ID stage and gener-
ates the target address at the same ID stage, followed by the I1 and I2 stages of the
target instruction. As a result, eight empty issue slots or four stall cycles are caused
as illustrated. This means only one third of the issue slots are used for the sequence.
The SH-4 could execute the same four instruction sequence with two empty
issue slots or one-cycle stall, and four of six issue slots were used for the sequence
as described in Sect. 3.1.2.6 . The branch performance was seriously degraded and
required cycle performance recovery.
Figure 3.11 illustrates the execution sequence of the SH-X. The branch operation
can start with no pipeline stall by a branch prediction, which predicts the branch
 
Search WWH ::




Custom Search