Hardware Reference
In-Depth Information
a. [12] <C.1> List a rearranged order of the five traditional stages of the RISC pipeline
that will support register-memory operations implemented exclusively by register in-
direct addressing.
b. [13] <C.2, C.3> Describe what new forwarding paths are needed for the rearranged
pipeline by stating the source, destination, and information transferred on each
needed new path.
c. [13] <C.2, C.3> For the reordered stages of the RISC pipeline, what new data hazards
are created by this addressing mode? Give an instruction sequence illustrating each
new hazard.
d. [15] <C.3> List all of the ways that the RISC pipeline with register-memory ALU op-
erations can have a different instruction count for a given program than the original
RISC pipeline. Give a pair of specific instruction sequences, one for the original
pipeline and one for the rearranged pipeline, to illustrate each way.
e. [15] <C.3> Assume that all instructions take 1 clock cycle per stage. List all of the ways
that the register-memory RISC can have a different CPI for a given program as com-
pared to the original RISC pipeline.
C.7 [10/10] <C.3> In this problem, we will explore how deepening the pipeline affects per-
formance in two ways: faster clock cycle and increased stalls due to data and control haz-
ards. Assume that the original machine is a 5-stage pipeline with a 1 ns clock cycle. The
second machine is a 12-stage pipeline with a 0.6 ns clock cycle. The 5-stage pipeline experi-
ences a stall due to a data hazard every 5 instructions, whereas the 12-stage pipeline exper-
iences 3 stalls every 8 instructions. In addition, branches constitute 20% of the instructions,
and the misprediction rate for both machines is 5%.
a. [10] <C.3> What is the speedup of the 12-stage pipeline over the 5-stage pipeline, tak-
ing into account only data hazards?
b. [10] <C.3> If the branch mispredict penalty for the first machine is 2 cycles but the
second machine is 5 cycles, what are the CPIs of each, taking into account the stalls
due to branch mispredictions?
C.8 [15] <C.5> Create a table showing the forwarding logic for the R4000 integer pipeline us-
ing the same format as that shown in Figure C.26 . Include only the MIPS instructions we
considered in Figure C.26 .
C.9 [15] <C.5> Create a table showing the R4000 integer hazard detection using the same
format as that shown in Figure C.25 . Include only the MIPS instructions we considered in
Figure C.26 .
C.10 [25] <C.5> Suppose MIPS had only one register set. Construct the forwarding table for
the FP and integer instructions using the format of Figure C.26 . Ignore FP and integer di-
vides.
C.11 [15] <C.5> Construct a table like that shown in Figure C.25 to check for WAW stalls in
the MIPS FP pipeline of Figure C.35 . Do not consider FP divides.
C.12 [20/22/22] <C.4, C.6> In this exercise, we will look at how a common vector loop runs
on statically and dynamically scheduled versions of the MIPS pipeline. The loop is the so-
called DAXPY loop (discussed extensively in Appendix G) and the central operation in
Gaussian elimination. The loop implements the vector operation Y = a * X + Y for a vector
of length 100. Here is the MIPS code for the loop:
foo:
L.D
F2, 0(R1)
; load X(i)
MUL.D
F4, F2, F0
; multiply a*X(i)
L.D
F6, 0($2)
; load Y(i)
ADD.D
F6, F4, F6
; add a*X(i) + Y(i)
Search WWH ::




Custom Search