Hardware Reference
In-Depth Information
and show how a VLIW capable of two loads and two adds per cycle can use the minimum
number of registers, in the absence of any pipeline interruptions or stalls. Give an example
of an event that, in the presence of self-draining pipelines, could disrupt this pipelining
and yield wrong results.
FIGURE 3.53 Sample VLIW code with two adds, two loads, and two stalls .
3.11 [10/10/10] <3.3> Assume a five-stage single-pipeline microarchitecture (fetch, decode, ex-
ecute, memory, write-back) and the code in Figure 3.54 . All ops are one cycle except LW and
SW , which are 1 + 2 cycles, and branches, which are 1 + 1 cycles. There is no forwarding.
Show the phases of each instruction per clock cycle for one iteration of the loop.
a. [10] <3.3> How many clock cycles per loop iteration are lost to branch overhead?
b. [10] <3.3> Assume a static branch predictor, capable of recognizing a backwards
branch in the Decode stage. Now how many clock cycles are wasted on branch over-
head?
c. [10] <3.3> Assume a dynamic branch predictor. How many cycles are lost on a correct
prediction?
FIGURE 3.54 Code loop for Exercise 3.11 .
3.12 [15/20/20/10/20] <3.4, 3.7, 3.14> Let's consider what dynamic scheduling might achieve
here. Assume a microarchitecture as shown in Figure 3.55 . Assume that the arithmetic-lo-
gical units (ALUs) can do all arithmetic ops ( MULTD, DIVD, ADDD, ADDI, SUB ) and branches, and
that the Reservation Station (RS) can dispatch at most one operation to each functional unit
 
 
 
Search WWH ::




Custom Search