Hardware Reference
In-Depth Information
S.D 0(R2), F6 ; store Y(i)
DADDIU R1, R1, #8 ; increment X index
DADDIU R2, R2, #8 ; increment Y index
SGTIU R3, R1, done ; test if done
BEQZ R3, foo ; loop if not done
For parts (a) to (c), assume that integer operations issue and complete in 1 clock cycle
(including loads) and that their results are fully bypassed. Ignore the branch delay. You
will use the FP latencies (only) shown in Figure C.34 , but assume that the FP unit is fully
pipelined. For scoreboards below, assume that an instruction waiting for a result from an-
other function unit can pass through read operands at the same time the result is written.
Also assume that an instruction in WR completing will allow a currently active instruction
that is waiting on the same functional unit to issue in the same clock cycle in which the first
instruction completes WR.
a. [20] <C.5> For this problem, use the MIPS pipeline of Section C.5 with the pipeline
latencies from Figure C.34 , but a fully pipelined FP unit, so the initiation interval is
1. Draw a timing diagram, similar to Figure C.37 , showing the timing of each instruc-
tion's execution. How many clock cycles does each loop iteration take, counting from
when the first instruction enters the WB stage to when the last instruction enters the
WB stage?
b. [22] <C.6> Using the MIPS code for DAXPY above, show the state of the scoreboard
tables (as in Figure C.56 ) when the SGTIU instruction reaches write result. Assume
that issue and read operands each take a cycle. Assume that there is one integer func-
tional unit that takes only a single execution cycle (the latency to use is 0 cycles, in-
cluding loads and stores). Assume the FP unit configuration of Figure C.54 with the
FP latencies of Figure C.34 . The branch should not be included in the scoreboard.
c. [22] <C.6> Using the MIPS code for DAXPY above, assume a scoreboard with the FP
functional units described in Figure C.54 , plus one integer functional unit (also used
for load-store). Assume the latencies shown in Figure C.59 . Show the state of the score-
board (as in Figure C.56 ) when the branch issues for the second time. Assume that the
branch was correctly predicted taken and took 1 cycle. How many clock cycles does
each loop iteration take? You may ignore any register port/bus conflicts.
FIGURE C.59 Pipeline latencies where latency is number
C.13 [25] <C.8> It is critical that the scoreboard be able to distinguish RAW and WAR hazards,
because a WAR hazard requires stalling the instruction doing the writing until the instruc-
tion reading an operand initiates execution, but a RAW hazard requires delaying the read-
 
Search WWH ::




Custom Search