Hardware Reference
In-Depth Information
Conditional branches
15%
Jumps and calls
1%
Taken conditional branches
60% are taken
a. [15] <C.2> We are examining a four-deep pipeline where the branch is resolved at the
end of the second cycle for unconditional branches and at the end of the third cycle
for conditional branches. Assuming that only the first pipe stage can always be done
independent of whether the branch goes and ignoring other pipeline stalls, how much
faster would the machine be without any branch hazards?
b. [15] <C.2> Now assume a high-performance processor in which we have a 15-deep
pipeline where the branch is resolved at the end of the fifth cycle for unconditional
branches and at the end of the tenth cycle for conditional branches. Assuming that
only the first pipe stage can always be done independent of whether the branch goes
and ignoring other pipeline stalls, how much faster would the machine be without any
branch hazards?
C.3 [5/15/10/10] <C.2> We begin with a computer implemented in single-cycle implementa-
tion. When the stages are split by functionality, the stages do not require exactly the same
amount of time. The original machine had a clock cycle time of 7 ns. After the stages were
split, the measured times were IF, 1 ns; ID, 1.5 ns; EX, 1 ns; MEM, 2 ns; and WB, 1.5 ns. The
pipeline register delay is 0.1 ns.
a. [5] <C.2> What is the clock cycle time of the 5-stage pipelined machine?
b. [15] <C.2> If there is a stall every 4 instructions, what is the CPI of the new machine?
c. [10] <C.2> What is the speedup of the pipelined machine over the single-cycle ma-
chine?
d. [10] <C.2> If the pipelined machine had an infinite number of stages, what would its
speedup be over the single-cycle machine?
C.4 [15] <C.1, C.2> A reduced hardware implementation of the classic five-stage RISC
pipeline might use the EX stage hardware to perform a branch instruction comparison and
then not actually deliver the branch target PC to the IF stage until the clock cycle in which
the branch instruction reaches the MEM stage. Control hazard stalls can be reduced by
resolving branch instructions in ID, but improving performance in one respect may reduce
performance in other circumstances. Write a small snippet of code in which calculating the
branch in the ID stage causes a data hazard, even with data forwarding.
C.5 [12/13/20/20/15/15] <C.2, C.3> For these problems, we will explore a pipeline for a
register-memory architecture. The architecture has two instruction formats: a register-re-
gister format and a register-memory format. There is a single-memory addressing mode
(ofset + base register). There is a set of ALU operations with the format:
ALUop Rdest, Rsrc1, Rsrc2
or
ALUop Rdest, Rsrc1, MEM
where the ALUop is one of the following: add, subtract, AND, OR, load (Rsrc1 ignored),
or store. Rsrc or Rdest are registers. MEM is a base register and offset pair. Branches use
a full compare of two registers and are PC relative. Assume that this machine is pipelined
so that a new instruction is started every clock cycle. The pipeline structure, similar to that
used in the VAX 8700 micropipeline [Clark 1987], is
Search WWH ::




Custom Search