Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

exactly those exceptions that would arise if the program were executed in strict program order

actually do arise. Dynamically scheduled processors preserve exception behavior by delay-

ing the notification of an associated exception until the processor knows that the instruction

should be the next one completed.

Although exception behavior must be preserved, dynamically scheduled processors could

generate imprecise exceptions. An exception is imprecise if the processor state when an excep-

tion is raised does not look exactly as if the instructions were executed sequentially in strict

program order. Imprecise exceptions can occur because of two possibilities:

1. The pipeline may have already completed instructions that are later in program order than

the instruction causing the exception.

2. The pipeline may have not yet completed some instructions that are earlier in program order

than the instruction causing the exception.

Imprecise exceptions make it difficult to restart execution after an exception. Rather than ad-

dress these problems in this section, we will discuss a solution that provides precise exceptions

in the context of a processor with speculation in Section 3.6 . For floating-point exceptions, oth-

er solutions have been used, as discussed in Appendix J.

To allow out-of-order execution, we essentially split the ID pipe stage of our simple ive-

stage pipeline into two stages:

1. Issue —Decode instructions, check for structural hazards.

2. Read operands —Wait until no data hazards, then read operands.

An instruction fetch stage precedes the issue stage and may fetch either into an instruction re-

gister or into a queue of pending instructions; instructions are then issued from the register or

queue. The execution stage follows the read operands stage, just as in the five-stage pipeline.

Execution may take multiple cycles, depending on the operation.

We distinguish when an instruction begins execution and when it completes execution ; between

the two times, the instruction is in execution . Our pipeline allows multiple instructions to be in

execution at the same time; without this capability, a major advantage of dynamic scheduling

is lost. Having multiple instructions in execution at once requires multiple functional units,

pipelined functional units, or both. Since these two capabilities—pipelined functional units

and multiple functional units—are essentially equivalent for the purposes of pipeline control,

we will assume the processor has multiple functional units.

In a dynamically scheduled pipeline, all instructions pass through the issue stage in order

(in-order issue); however, they can be stalled or bypass each other in the second stage (read

operands) and thus enter execution out of order. Scoreboarding is a technique for allowing in-

structions to execute out of order when there are sufficient resources and no data dependen-

ces; it is named after the CDC 6600 scoreboard, which developed this capability. Here, we fo-

cus on a more sophisticated technique, called Tomasulo's algorithm . The primary difference is

that Tomasulo's algorithm handles antidependences and output dependences by efectively

renaming the registers dynamically. Additionally, Tomasulo's algorithm can be extended to

handle speculation , a technique to reduce the effect of control dependences by predicting the

outcome of a branch, executing instructions at the predicted destination address, and taking

corrective actions when the prediction was wrong. While the use of scoreboarding is probably

sufficient to support a simple two-issue superscalar like the ARM A8, a more aggressive pro-

cessor, like the four-issue Intel i7, benefits from the use of out-of-order execution.

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home