Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

the equivalent of 10 registers that can be designated as result registers (as opposed to the four

double-precision registers that the 360 architecture contains). In a processor with more real

registers, we would want renaming to provide an even larger set of virtual registers. The tag

ield describes which reservation station contains the instruction that will produce a result

needed as a source operand.

Once an instruction has issued and is waiting for a source operand, it refers to the operand

by the reservation station number where the instruction that will write the register has been

assigned. Unused values, such as zero, indicate that the operand is already available in the

registers. Because there are more reservation stations than actual register numbers, WAW

and WAR hazards are eliminated by renaming results using reservation station numbers. Al-

though in Tomasulo's scheme the reservation stations are used as the extended virtual re-

gisters, other approaches could use a register set with additional registers or a structure like

the reorder buffer, which we will see in Section 3.6 .

In Tomasulo's scheme, as well as the subsequent methods we look at for supporting spec-

ulation, results are broadcast on a bus (the CDB), which is monitored by the reservation sta-

tions. The combination of the common result bus and the retrieval of results from the bus by

the reservation stations implements the forwarding and bypassing mechanisms used in a stat-

ically scheduled pipeline. In doing so, however, a dynamically scheduled scheme introduces

one cycle of latency between source and result, since the matching of a result and its use can-

not be done until the Write Result stage. Thus, in a dynamically scheduled pipeline, the efect-

ive latency between a producing instruction and a consuming instruction is at least one cycle

longer than the latency of the functional unit producing the result.

It is important to remember that the tags in the Tomasulo scheme refer to the buffer or unit

that will produce a result; the register names are discarded when an instruction issues to a re-

servation station. (This is a key difference between Tomasulo's scheme and scoreboarding: In

scoreboarding, operands stay in the registers and are only read after the producing instruction

completes and the consuming instruction is ready to execute.)

Each reservation station has seven fields:

■ Op—The operation to perform on source operands S1 and S2.

■ Qj, Qk—The reservation stations that will produce the corresponding source operand; a

value of zero indicates that the source operand is already available in Vj or Vk, or is unne-

cessary.

■ Vj, Vk—The value of the source operands. Note that only one of the V fields or the Q field

is valid for each operand. For loads, the Vk field is used to hold the offset field.

■ A—Used to hold information for the memory address calculation for a load or store. Ini-

tially, the immediate field of the instruction is stored here; after the address calculation, the

efective address is stored here.

■ Busy—Indicates that this reservation station and its accompanying functional unit are oc-

cupied.

The register file has a field, Qi:

■ Qi—The number of the reservation station that contains the operation whose result should

be stored into this register. If the value of Qi is blank (or 0), no currently active instruction

is computing a result destined for this register, meaning that the value is simply the re-

gister contents.

The load and store buffers each have a field, A, which holds the result of the effective address

once the first step of execution has been completed.

In the next section, we will first consider some examples that show how these mechanisms

work and then examine the detailed algorithm.

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home