Pipelining: Basic and Intermediate Concepts - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Conditional branches

15%

Jumps and calls

1%

Taken conditional branches

60% are taken

a. [15] <C.2> We are examining a four-deep pipeline where the branch is resolved at the

end of the second cycle for unconditional branches and at the end of the third cycle

for conditional branches. Assuming that only the first pipe stage can always be done

independent of whether the branch goes and ignoring other pipeline stalls, how much

faster would the machine be without any branch hazards?

b. [15] <C.2> Now assume a high-performance processor in which we have a 15-deep

pipeline where the branch is resolved at the end of the fifth cycle for unconditional

branches and at the end of the tenth cycle for conditional branches. Assuming that

only the first pipe stage can always be done independent of whether the branch goes

and ignoring other pipeline stalls, how much faster would the machine be without any

branch hazards?

C.3 [5/15/10/10] <C.2> We begin with a computer implemented in single-cycle implementa-

tion. When the stages are split by functionality, the stages do not require exactly the same

amount of time. The original machine had a clock cycle time of 7 ns. After the stages were

split, the measured times were IF, 1 ns; ID, 1.5 ns; EX, 1 ns; MEM, 2 ns; and WB, 1.5 ns. The

pipeline register delay is 0.1 ns.

a. [5] <C.2> What is the clock cycle time of the 5-stage pipelined machine?

b. [15] <C.2> If there is a stall every 4 instructions, what is the CPI of the new machine?

c. [10] <C.2> What is the speedup of the pipelined machine over the single-cycle ma-

chine?

d. [10] <C.2> If the pipelined machine had an infinite number of stages, what would its

speedup be over the single-cycle machine?

C.4 [15] <C.1, C.2> A reduced hardware implementation of the classic five-stage RISC

pipeline might use the EX stage hardware to perform a branch instruction comparison and

then not actually deliver the branch target PC to the IF stage until the clock cycle in which

the branch instruction reaches the MEM stage. Control hazard stalls can be reduced by

resolving branch instructions in ID, but improving performance in one respect may reduce

performance in other circumstances. Write a small snippet of code in which calculating the

branch in the ID stage causes a data hazard, even with data forwarding.

C.5 [12/13/20/20/15/15] <C.2, C.3> For these problems, we will explore a pipeline for a

register-memory architecture. The architecture has two instruction formats: a register-re-

gister format and a register-memory format. There is a single-memory addressing mode

(ofset + base register). There is a set of ALU operations with the format:

ALUop Rdest, Rsrc1, Rsrc2

or

ALUop Rdest, Rsrc1, MEM

where the ALUop is one of the following: add, subtract, AND, OR, load (Rsrc1 ignored),

or store. Rsrc or Rdest are registers. MEM is a base register and offset pair. Branches use

a full compare of two registers and are PC relative. Assume that this machine is pipelined

so that a new instruction is started every clock cycle. The pipeline structure, similar to that

used in the VAX 8700 micropipeline [Clark 1987], is

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home