Pipelining: Basic and Intermediate Concepts - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

a. [12] <C.1> List a rearranged order of the five traditional stages of the RISC pipeline

that will support register-memory operations implemented exclusively by register in-

direct addressing.

b. [13] <C.2, C.3> Describe what new forwarding paths are needed for the rearranged

pipeline by stating the source, destination, and information transferred on each

needed new path.

c. [13] <C.2, C.3> For the reordered stages of the RISC pipeline, what new data hazards

are created by this addressing mode? Give an instruction sequence illustrating each

new hazard.

d. [15] <C.3> List all of the ways that the RISC pipeline with register-memory ALU op-

erations can have a different instruction count for a given program than the original

RISC pipeline. Give a pair of specific instruction sequences, one for the original

pipeline and one for the rearranged pipeline, to illustrate each way.

e. [15] <C.3> Assume that all instructions take 1 clock cycle per stage. List all of the ways

that the register-memory RISC can have a different CPI for a given program as com-

pared to the original RISC pipeline.

C.7 [10/10] <C.3> In this problem, we will explore how deepening the pipeline affects per-

formance in two ways: faster clock cycle and increased stalls due to data and control haz-

ards. Assume that the original machine is a 5-stage pipeline with a 1 ns clock cycle. The

second machine is a 12-stage pipeline with a 0.6 ns clock cycle. The 5-stage pipeline experi-

ences a stall due to a data hazard every 5 instructions, whereas the 12-stage pipeline exper-

iences 3 stalls every 8 instructions. In addition, branches constitute 20% of the instructions,

and the misprediction rate for both machines is 5%.

a. [10] <C.3> What is the speedup of the 12-stage pipeline over the 5-stage pipeline, tak-

ing into account only data hazards?

b. [10] <C.3> If the branch mispredict penalty for the first machine is 2 cycles but the

second machine is 5 cycles, what are the CPIs of each, taking into account the stalls

due to branch mispredictions?

C.8 [15] <C.5> Create a table showing the forwarding logic for the R4000 integer pipeline us-

ing the same format as that shown in Figure C.26 . Include only the MIPS instructions we

considered in Figure C.26 .

C.9 [15] <C.5> Create a table showing the R4000 integer hazard detection using the same

format as that shown in Figure C.25 . Include only the MIPS instructions we considered in

Figure C.26 .

C.10 [25] <C.5> Suppose MIPS had only one register set. Construct the forwarding table for

the FP and integer instructions using the format of Figure C.26 . Ignore FP and integer di-

vides.

C.11 [15] <C.5> Construct a table like that shown in Figure C.25 to check for WAW stalls in

the MIPS FP pipeline of Figure C.35 . Do not consider FP divides.

C.12 [20/22/22] <C.4, C.6> In this exercise, we will look at how a common vector loop runs

on statically and dynamically scheduled versions of the MIPS pipeline. The loop is the so-

called DAXPY loop (discussed extensively in Appendix G) and the central operation in

Gaussian elimination. The loop implements the vector operation Y = a * X + Y for a vector

of length 100. Here is the MIPS code for the loop:

foo:

L.D

F2, 0(R1)

; load X(i)

MUL.D

F4, F2, F0

; multiply a*X(i)

L.D

F6, 0($2)

; load Y(i)

ADD.D

F6, F4, F6

; add a*X(i) + Y(i)

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home