Pipelining: Basic and Intermediate Concepts - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

DADDI R1,R1,#1 ;R1=R1+1

SD R1,0,(R2) ;store R1 at address 0+R2

DADDI R2,R2,#4 ;R2=R2+4

DSUB R4,R3,R2 ;R4=R3-R2

BNEZ R4,Loop ;branch to Loop if R4!=0

Assume that the initial value of R3 is R2 + 396.

a. [15] <C.2> Data hazards are caused by data dependences in the code. Whether a de-

pendency causes a hazard depends on the machine implementation (i.e., number of

pipeline stages). List all of the data dependences in the code above. Record the re-

gister, source instruction, and destination instruction; for example, there is a data de-

pendency for register R1 from the LD to the DADDI .

b. [15] <C.2> Show the timing of this instruction sequence for the 5-stage RISC pipeline

without any forwarding or bypassing hardware but assuming that a register read and

a write in the same clock cycle “forwards” through the register file, as shown in Fig-

ure C.6 . Use a pipeline timing chart like that in Figure C.5 . Assume that the branch

is handled by flushing the pipeline. If all memory references take 1 cycle, how many

cycles does this loop take to execute?

c. [15] <C.2> Show the timing of this instruction sequence for the 5-stage RISC pipeline

with full forwarding and bypassing hardware. Use a pipeline timing chart like that

shown in Figure C.5 . Assume that the branch is handled by predicting it as not taken.

If all memory references take 1 cycle, how many cycles does this loop take to execute?

d. [15] <C.2> Show the timing of this instruction sequence for the 5-stage RISC pipeline

with full forwarding and bypassing hardware. Use a pipeline timing chart like that

shown in Figure C.5 . Assume that the branch is handled by predicting it as taken. If

all memory references take 1 cycle, how many cycles does this loop take to execute?

e [25] <C.2> High-performance processors have very deep pipelines—more than 15

stages. Imagine that you have a 10-stage pipeline in which every stage of the 5-stage

pipeline has been split in two. The only catch is that, for data forwarding, data are

forwarded from the end of a pair of stages to the beginning of the two stages where

they are needed. For example, data are forwarded from the output of the second ex-

ecute stage to the input of the first execute stage, still causing a 1-cycle delay. Show the

timing of this instruction sequence for the 10-stage RISC pipeline with full forwarding

and bypassing hardware. Use a pipeline timing chart like that shown in Figure C.5 .

Assume that the branch is handled by predicting it as taken. If all memory references

take 1 cycle, how many cycles does this loop take to execute?

f [10] <C.2> Assume that in the 5-stage pipeline the longest stage requires 0.8 ns, and the

pipeline register delay is 0.1 ns. What is the clock cycle time of the 5-stage pipeline?

If the 10-stage pipeline splits all stages in half, what is the cycle time of the 10-stage

machine?

g [15] <C.2> Using your answers from parts (d) and (e), determine the cycles per instruc-

tion (CPI) for the loop on a 5-stage pipeline and a 10-stage pipeline. Make sure you

count only from when the first instruction reaches the write-back stage to the end. Do

not count the start-up of the first instruction. Using the clock cycle time calculated in

part (f), calculate the average instruction execute time for each machine.

C.2 [15/15] <C.2> Suppose the branch frequencies (as percentages of all instructions) are as

follows:

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home