Hardware Reference
In-Depth Information
DADDI R1,R1,#1 ;R1=R1+1
SD R1,0,(R2) ;store R1 at address 0+R2
DADDI R2,R2,#4 ;R2=R2+4
DSUB R4,R3,R2 ;R4=R3-R2
BNEZ R4,Loop ;branch to Loop if R4!=0
Assume that the initial value of R3 is R2 + 396.
a. [15] <C.2> Data hazards are caused by data dependences in the code. Whether a de-
pendency causes a hazard depends on the machine implementation (i.e., number of
pipeline stages). List all of the data dependences in the code above. Record the re-
gister, source instruction, and destination instruction; for example, there is a data de-
pendency for register R1 from the LD to the DADDI .
b. [15] <C.2> Show the timing of this instruction sequence for the 5-stage RISC pipeline
without any forwarding or bypassing hardware but assuming that a register read and
a write in the same clock cycle “forwards” through the register file, as shown in Fig-
ure C.6 . Use a pipeline timing chart like that in Figure C.5 . Assume that the branch
is handled by flushing the pipeline. If all memory references take 1 cycle, how many
cycles does this loop take to execute?
c. [15] <C.2> Show the timing of this instruction sequence for the 5-stage RISC pipeline
with full forwarding and bypassing hardware. Use a pipeline timing chart like that
shown in Figure C.5 . Assume that the branch is handled by predicting it as not taken.
If all memory references take 1 cycle, how many cycles does this loop take to execute?
d. [15] <C.2> Show the timing of this instruction sequence for the 5-stage RISC pipeline
with full forwarding and bypassing hardware. Use a pipeline timing chart like that
shown in Figure C.5 . Assume that the branch is handled by predicting it as taken. If
all memory references take 1 cycle, how many cycles does this loop take to execute?
e [25] <C.2> High-performance processors have very deep pipelines—more than 15
stages. Imagine that you have a 10-stage pipeline in which every stage of the 5-stage
pipeline has been split in two. The only catch is that, for data forwarding, data are
forwarded from the end of a pair of stages to the beginning of the two stages where
they are needed. For example, data are forwarded from the output of the second ex-
ecute stage to the input of the first execute stage, still causing a 1-cycle delay. Show the
timing of this instruction sequence for the 10-stage RISC pipeline with full forwarding
and bypassing hardware. Use a pipeline timing chart like that shown in Figure C.5 .
Assume that the branch is handled by predicting it as taken. If all memory references
take 1 cycle, how many cycles does this loop take to execute?
f [10] <C.2> Assume that in the 5-stage pipeline the longest stage requires 0.8 ns, and the
pipeline register delay is 0.1 ns. What is the clock cycle time of the 5-stage pipeline?
If the 10-stage pipeline splits all stages in half, what is the cycle time of the 10-stage
machine?
g [15] <C.2> Using your answers from parts (d) and (e), determine the cycles per instruc-
tion (CPI) for the loop on a 5-stage pipeline and a 10-stage pipeline. Make sure you
count only from when the first instruction reaches the write-back stage to the end. Do
not count the start-up of the first instruction. Using the clock cycle time calculated in
part (f), calculate the average instruction execute time for each machine.
C.2 [15/15] <C.2> Suppose the branch frequencies (as percentages of all instructions) are as
follows:
Search WWH ::




Custom Search