Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

There is an important difference in how stores are handled in a speculative processor

versus in Tomasulo's algorithm. In Tomasulo's algorithm, a store can update memory when

it reaches write result (which ensures that the effective address has been calculated) and the

data value to store is available. In a speculative processor, a store updates memory only when

it reaches the head of the ROB. This difference ensures that memory is not updated until an

instruction is no longer speculative.

Figure 3.14 has one significant simplification for stores, which is unneeded in practice. Fig-

ure 3.14 requires stores to wait in the write result stage for the register source operand whose

value is to be stored; the value is then moved from the Vk field of the store's reservation sta-

tion to the Value field of the store's ROB entry. In reality, however, the value to be stored need

not arrive until just before the store commits and can be placed directly into the store's ROB

entry by the sourcing instruction. This is accomplished by having the hardware track when

the source value to be stored is available in the store's ROB entry and searching the ROB on

every instruction completion to look for dependent stores.

This addition is not complicated, but adding it has two effects: We would need to add a field

to the ROB, and Figure 3.14 , which is already in a small font, would be even longer! Although

Figure 3.14 makes this simplification, in our examples, we will allow the store to pass through

the write result stage and simply wait for the value to be ready when it commits.

Like Tomasulo's algorithm, we must avoid hazards through memory. WAW and WAR haz-

ards through memory are eliminated with speculation because the actual updating of memory

occurs in order, when a store is at the head of the ROB, and, hence, no earlier loads or stores

can still be pending. RAW hazards through memory are maintained by two restrictions:

1. Not allowing a load to initiate the second step of its execution if any active ROB entry oc-

cupied by a store has a Destination field that matches the value of the A field of the load.

2. Maintaining the program order for the computation of an effective address of a load with

respect to all earlier stores.

Together, these two restrictions ensure that any load that accesses a memory location writen

to by an earlier store cannot perform the memory access until the store has writen the data.

Some speculative processors will actually bypass the value from the store to the load directly,

when such a RAW hazard occurs. Another approach is to predict potential collisions using a

form of value prediction; we consider this in Section 3.9 .

Although this explanation of speculative execution has focused on floating point, the tech-

niques easily extend to the integer registers and functional units. Indeed, speculation may be

more useful in integer programs, since such programs tend to have code where the branch

behavior is less predictable. Additionally, these techniques can be extended to work in a

multiple-issue processor by allowing multiple instructions to issue and commit every clock.

In fact, speculation is probably most interesting in such processors, since less ambitious tech-

niques can probably exploit sufficient ILP within basic blocks when assisted by a compiler.

3.7 Exploiting ILP Using Multiple Issue and Static

Scheduling

The techniques of the preceding sections can be used to eliminate data, control stalls, and

achieve an ideal CPI of one. To improve performance further we would like to decrease the

CPI to less than one, but the CPI cannot be reduced below one if we issue only one instruction

every clock cycle.

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home