Hardware Reference
In-Depth Information
There is an important difference in how stores are handled in a speculative processor
versus in Tomasulo's algorithm. In Tomasulo's algorithm, a store can update memory when
it reaches write result (which ensures that the effective address has been calculated) and the
data value to store is available. In a speculative processor, a store updates memory only when
it reaches the head of the ROB. This difference ensures that memory is not updated until an
instruction is no longer speculative.
Figure 3.14 has one significant simplification for stores, which is unneeded in practice. Fig-
ure 3.14 requires stores to wait in the write result stage for the register source operand whose
value is to be stored; the value is then moved from the Vk field of the store's reservation sta-
tion to the Value field of the store's ROB entry. In reality, however, the value to be stored need
not arrive until just before the store commits and can be placed directly into the store's ROB
entry by the sourcing instruction. This is accomplished by having the hardware track when
the source value to be stored is available in the store's ROB entry and searching the ROB on
every instruction completion to look for dependent stores.
This addition is not complicated, but adding it has two effects: We would need to add a field
to the ROB, and Figure 3.14 , which is already in a small font, would be even longer! Although
Figure 3.14 makes this simplification, in our examples, we will allow the store to pass through
the write result stage and simply wait for the value to be ready when it commits.
Like Tomasulo's algorithm, we must avoid hazards through memory. WAW and WAR haz-
ards through memory are eliminated with speculation because the actual updating of memory
occurs in order, when a store is at the head of the ROB, and, hence, no earlier loads or stores
can still be pending. RAW hazards through memory are maintained by two restrictions:
1. Not allowing a load to initiate the second step of its execution if any active ROB entry oc-
cupied by a store has a Destination field that matches the value of the A field of the load.
2. Maintaining the program order for the computation of an effective address of a load with
respect to all earlier stores.
Together, these two restrictions ensure that any load that accesses a memory location writen
to by an earlier store cannot perform the memory access until the store has writen the data.
Some speculative processors will actually bypass the value from the store to the load directly,
when such a RAW hazard occurs. Another approach is to predict potential collisions using a
form of value prediction; we consider this in Section 3.9 .
Although this explanation of speculative execution has focused on floating point, the tech-
niques easily extend to the integer registers and functional units. Indeed, speculation may be
more useful in integer programs, since such programs tend to have code where the branch
behavior is less predictable. Additionally, these techniques can be extended to work in a
multiple-issue processor by allowing multiple instructions to issue and commit every clock.
In fact, speculation is probably most interesting in such processors, since less ambitious tech-
niques can probably exploit sufficient ILP within basic blocks when assisted by a compiler.
3.7 Exploiting ILP Using Multiple Issue and Static
Scheduling
The techniques of the preceding sections can be used to eliminate data, control stalls, and
achieve an ideal CPI of one. To improve performance further we would like to decrease the
CPI to less than one, but the CPI cannot be reduced below one if we issue only one instruction
every clock cycle.
 
Search WWH ::




Custom Search