Hardware Reference
In-Depth Information
Branch
Delay Slot
IF
ID
EX
MA
WB
IF
ID
EX
MA
WB
ID
Empty Issue Slot
Target
IF
ID
EX
MA
WB
4 cycles
Fig. 3.2
Branch sequence of a scalar processor
Branch
Delay Slot
IF
ID
EX
MA
WB
IF
ID
EX
MA
WB
ID
ID
Empty Issue Slots
ID
ID
Target
IF
ID
EX
MA
WB
4 cycles
Fig. 3.3
Branch sequence of a superscalar processor
Compare
IF
ID
EX
MA
WB
Branch
Delay Slot
IF
ID
EX
MA
WB
IF
ID
EX
MA
WB
ID
ID
Empty Issue Slots
Target
IF
ID
EX
MA
WB
3 cycles
Fig. 3.4
Branch sequence of SH-4 with early-stage branch
3.1.2.6
Early-Stage Branch
The SH-4 adopted an early-stage branch to reduce the increased branch penalty by
the superscalar architecture. Figures 3.2 - 3.4 illustrate branch sequences of a scalar
processor, a superscalar processor, and the SH-4 with the early-stage branch, respec-
tively. The sequence consists of branch, delay slot, and target instructions. In the
SH-4 case, a compare instruction, which is often right before the conditional branch
instruction, is also shown to clarify the define-use distance of a branch condition
between the EX and ID stages of the compare and branch instructions.
Both the scalar and superscalar processors execute the three instructions in the
same four cycles. There is no performance gain by the superscalar architecture, and
the empty issue slot becomes three or four times more. On the other hand, the SH-4
executes the three instructions in three cycles with one or two empty issue slots.
The branch without a delay slot requires one more empty issue slot for all the cases.
As shown by the example sequences, the SH-4 performance was enhanced, and the
empty issue slots decreased.
 
Search WWH ::




Custom Search