Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

3. Instruction memory access and buffering, —When fetching multiple instructions per cycle a

variety of complexities are encountered, including the difficulty that fetching multiple in-

structions may require accessing multiple cache lines. The instruction fetch unit encapsu-

lates this complexity, using prefetch to try to hide the cost of crossing cache blocks. The

instruction fetch unit also provides buffering, essentially acting as an on-demand unit to

provide instructions to the issue stage as needed and in the quantity needed.

Virtually all high-end processors now use a separate instruction fetch unit connected to the

rest of the pipeline by a buffer containing pending instructions.

Speculation: Implementation Issues And Extensions

In this section we explore four issues that involve the design trade-offs in speculation, starting

with the use of register renaming, the approach that is often used instead of a reorder bufer.

We then discuss one important possible extension to speculation on control flow: an idea

called value prediction .

Speculation Support: Register Renaming versus Reorder Buffers

One alternative to the use of a reorder buffer (ROB) is the explicit use of a larger physical set of

registers combined with register renaming. This approach builds on the concept of renaming

used in Tomasulo's algorithm and extends it. In Tomasulo's algorithm, the values of the archi-

tecturally visible registers (R0, …, R31 and F0, …, F31) are contained, at any point in execution,

in some combination of the register set and the reservation stations. With the addition of spec-

ulation, register values may also temporarily reside in the ROB. In either case, if the processor

does not issue new instructions for a period of time, all existing instructions will commit, and

the register values will appear in the register file, which directly corresponds to the architec-

turally visible registers.

In the register-renaming approach, an extended set of physical registers is used to hold both

the architecturally visible registers as well as temporary values. Thus, the extended registers

replace most of the function of the ROB and the reservation stations; only a queue to ensure

that instructions complete in order is needed. During instruction issue, a renaming process

maps the names of architectural registers to physical register numbers in the extended register

set, allocating a new unused register for the destination. WAW and WAR hazards are avoided

by renaming of the destination register, and speculation recovery is handled because a phys-

ical register holding an instruction destination does not become the architectural register until

the instruction commits. The renaming map is a simple data structure that supplies the phys-

ical register number of the register that currently corresponds to the specified architectural

register, a function performed by the register status table in Tomasulo's algorithm. When an

instruction commits, the renaming table is permanently updated to indicate that a physical re-

gister corresponds to the actual architectural register, thus effectively finalizing the update to

the processor state. Although an ROB is not necessary with register renaming, the hardware

must still track instructions in a queue-like structure and update the renaming table in strict

order.

An advantage of the renaming approach versus the ROB approach is that instruction com-

mit is slightly simplified, since it requires only two simple actions: (1) record that the mapping

between an architectural register number and physical register number is no longer speculat-

ive, and (2) free up any physical registers being used to hold the “older” value of the architec-

tural register. In a design with reservation stations, a station is freed up when the instruction

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home