Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

and show how a VLIW capable of two loads and two adds per cycle can use the minimum

number of registers, in the absence of any pipeline interruptions or stalls. Give an example

of an event that, in the presence of self-draining pipelines, could disrupt this pipelining

and yield wrong results.

FIGURE 3.53 Sample VLIW code with two adds, two loads, and two stalls .

3.11 [10/10/10] <3.3> Assume a five-stage single-pipeline microarchitecture (fetch, decode, ex-

ecute, memory, write-back) and the code in Figure 3.54 . All ops are one cycle except LW and

SW , which are 1 + 2 cycles, and branches, which are 1 + 1 cycles. There is no forwarding.

Show the phases of each instruction per clock cycle for one iteration of the loop.

a. [10] <3.3> How many clock cycles per loop iteration are lost to branch overhead?

b. [10] <3.3> Assume a static branch predictor, capable of recognizing a backwards

branch in the Decode stage. Now how many clock cycles are wasted on branch over-

head?

c. [10] <3.3> Assume a dynamic branch predictor. How many cycles are lost on a correct

prediction?

FIGURE 3.54 Code loop for Exercise 3.11 .

3.12 [15/20/20/10/20] <3.4, 3.7, 3.14> Let's consider what dynamic scheduling might achieve

here. Assume a microarchitecture as shown in Figure 3.55 . Assume that the arithmetic-lo-

gical units (ALUs) can do all arithmetic ops ( MULTD, DIVD, ADDD, ADDI, SUB ) and branches, and

that the Reservation Station (RS) can dispatch at most one operation to each functional unit

Search WWH ::

Custom Search

Home