Hardware Reference
In-Depth Information
Since our example is a superscalar machine that can issue two instructions per
cycle, a second instruction (I2) is issued during cycle 1. It adds R0 and R2 , storing
the result in R4 . To see if this instruction can be issued, these rules are applied:
1. If any operand is being written, do not issue (RAW dependence).
2. If the result register is being read, do not issue (WAR dependence).
3. If the result register is being written, do not issue (WAW depen-
dence).
We have already seen RAW dependences, which occur when an instruction needs
to use as a source a result that a previous instruction has not yet produced. The
other two dependences are less serious. They are essentially resource conflicts. In
a WAR dependence (Write After Read), one instruction is trying to overwrite a
register that a previous instruction may not yet have finished reading. A WAW
dependence (Write After Write) is similar. These can often be avoided by having
the second instruction put its results somewhere else (perhaps temporarily). If
none of the above three dependences exist, and the functional unit it needs is avail-
able, the instruction is issued. In this case, I2 uses a register ( R0 ) that is being read
by a pending instruction, but this overlap is permitted so I2 is issued. Similarly, I3
is issued during cycle 2.
Now we come to I4, which needs to use R4 . Unfortunately, we see from line 3
that R4 is being written. Here we have a RAW dependence, so the decode unit
stalls until R4 becomes available. While stalled, it stops pulling instructions from
the fetch unit. When the fetch unit's internal buffers fill up, it stops prefetching.
It is worth noting that the next instruction in program order, I5, does not have
conflicts with any of the pending instructions. It could have been decoded and
issued were it not for the fact that this design requires issuing instructions in order.
Now let us look at what happens during cycle 3. I2, being an addition (two
cycles), finishes at the end of cycle 3. Unfortunately, it cannot be retired (thus
freeing up R4 for I4). Why not? The reason is that this design also requires in-
order retirement. Why? What harm could possibly come from doing the store into
R4 now and marking it as available?
The answer is subtle, but important. Suppose that instructions could complete
out of order. Then if an interrupt occurred, it would be difficult to save the state of
the machine so it could be restored later. In particular, it would not be possible to
say that all instructions up to some address had been executed and all instructions
beyond it had not. This is called a precise interrupt and is a desirable characteris-
tic in a CPU (Moudgill and Vassiliadis, 1996). Out-of-order retirement makes in-
terrupts imprecise, which is why some machines complete instructions in order.
Getting back to our example, at the end of cycle 4, all three pending instruc-
tions can be retired, so in cycle 5 I4 can finally be issued, along with the newly
decoded I5. Whenever an instruction is retired, the decode unit has to check to see
if there is a stalled instruction that can now be issued.
 
Search WWH ::




Custom Search