Hardware Reference
In-Depth Information
In cycle 6, I6 stalls because it needs to write into R1 and R1 is busy. It is final-
ly started in cycle 9. The entire sequence of eight instructions takes 18 cycles to
complete due to many dependences, even though the hardware is capable of is-
suing two instructions on every cycle. Notice, however, when reading down the Iss
column of Fig. 4-43, that all the instructions have been issued in order. Likewise,
the Ret column shows that they have been retired in order as well.
Now let us consider an alternative design: out-of-order execution. In this de-
sign, instructions may be issued out of order and may be retired out of order as
well. The same sequence of eight instructions is shown in Fig. 4-44, only now
with out-of-order issue and out-of-order retirement permitted.
Registers being read
Registers being written
Cy
#
Decoded
Iss Ret 0123456701234567
1
1
R3=R0 * R1
1
1 1
1
2
R4=R0+R2
2
2 1 1
1 1
2
3
R5=R0+R1
3
3 2 1
1 1 1
4
R6=R1+R4
-
3 2 1
1 1 1
3
5
R7=R1 * R2 5
332
111 1
6
S1=R0
R2 6
433
111 1
2 332
1 1 1
4
4
342 1
1 111
7
R3=R3 * S1 -
342 1
1 111
8
S2=R4+R4
8
3 4 2
3
1
1 1 1
1 232 3
111
3 122 3
11
5
6
2 1
3
1
1 1
6
7
2113
1 1
11
4 1112
1 1
1
5
1 2
1
1
8
1
1
7
1
1
8
1
1
9
7
Figure 4-44. Operation of a superscalar CPU with out-of-order issue and out-of-
order completion.
The first difference occurs in cycle 3. Even though I4 has stalled, we are al-
lowed to decode and issue I5 since it does not conflict with any pending instruc-
tion. However, skipping over instructions causes a new problem. Suppose that I5
had used an operand computed by the skipped instruction, I4. With the current
scoreboard, we would not have noticed this. As a consequence, we have to extend
the scoreboard to keep track of stores done by skipped-over instructions. This can
be done by adding a second bit map, 1 bit per register, to keep track of stores done
 
Search WWH ::




Custom Search