Hardware Reference
In-Depth Information
the equivalent of 10 registers that can be designated as result registers (as opposed to the four
double-precision registers that the 360 architecture contains). In a processor with more real
registers, we would want renaming to provide an even larger set of virtual registers. The tag
ield describes which reservation station contains the instruction that will produce a result
needed as a source operand.
Once an instruction has issued and is waiting for a source operand, it refers to the operand
by the reservation station number where the instruction that will write the register has been
assigned. Unused values, such as zero, indicate that the operand is already available in the
registers. Because there are more reservation stations than actual register numbers, WAW
and WAR hazards are eliminated by renaming results using reservation station numbers. Al-
though in Tomasulo's scheme the reservation stations are used as the extended virtual re-
gisters, other approaches could use a register set with additional registers or a structure like
the reorder buffer, which we will see in Section 3.6 .
In Tomasulo's scheme, as well as the subsequent methods we look at for supporting spec-
ulation, results are broadcast on a bus (the CDB), which is monitored by the reservation sta-
tions. The combination of the common result bus and the retrieval of results from the bus by
the reservation stations implements the forwarding and bypassing mechanisms used in a stat-
ically scheduled pipeline. In doing so, however, a dynamically scheduled scheme introduces
one cycle of latency between source and result, since the matching of a result and its use can-
not be done until the Write Result stage. Thus, in a dynamically scheduled pipeline, the efect-
ive latency between a producing instruction and a consuming instruction is at least one cycle
longer than the latency of the functional unit producing the result.
It is important to remember that the tags in the Tomasulo scheme refer to the buffer or unit
that will produce a result; the register names are discarded when an instruction issues to a re-
servation station. (This is a key difference between Tomasulo's scheme and scoreboarding: In
scoreboarding, operands stay in the registers and are only read after the producing instruction
completes and the consuming instruction is ready to execute.)
Each reservation station has seven fields:
■ Op—The operation to perform on source operands S1 and S2.
■ Qj, Qk—The reservation stations that will produce the corresponding source operand; a
value of zero indicates that the source operand is already available in Vj or Vk, or is unne-
cessary.
■ Vj, Vk—The value of the source operands. Note that only one of the V fields or the Q field
is valid for each operand. For loads, the Vk field is used to hold the offset field.
■ A—Used to hold information for the memory address calculation for a load or store. Ini-
tially, the immediate field of the instruction is stored here; after the address calculation, the
efective address is stored here.
■ Busy—Indicates that this reservation station and its accompanying functional unit are oc-
cupied.
The register file has a field, Qi:
■ Qi—The number of the reservation station that contains the operation whose result should
be stored into this register. If the value of Qi is blank (or 0), no currently active instruction
is computing a result destined for this register, meaning that the value is simply the re-
gister contents.
The load and store buffers each have a field, A, which holds the result of the effective address
once the first step of execution has been completed.
In the next section, we will first consider some examples that show how these mechanisms
work and then examine the detailed algorithm.
Search WWH ::




Custom Search