Hardware Reference
In-Depth Information
a. [20] <3.4-3.5> For this problem use the single-issue Tomasulo MIPS pipeline of Figure
3.6 with the pipeline latencies from the table above. Show the number of stall cycles
for each instruction and what clock cycle each instruction begins execution (i.e., enters
its first EX cycle) for three iterations of the loop. How many cycles does each loop iter-
ation take? Report your answer in the form of a table with the following column head-
ers:
■ Iteration (loop iteration number)
■ Instruction
■ Issues (cycle when instruction issues)
■ Executes (cycle when instruction executes)
■ Memory access (cycle when memory is accessed)
■ Write CDB (cycle when result is writen to the CDB)
■ Comment (description of any event on which the instruction is waiting)
Show three iterations of the loop in your table. You may ignore the first instruction.
b. [20] <3.7, 3.8> Repeat part (a) but this time assume a two-issue Tomasulo algorithm
and a fully pipelined floating-point unit (FPU).
3.16 [10] <3.4> Tomasulo's algorithm has a disadvantage: Only one result can compute per
clock per CDB. Use the hardware configuration and latencies from the previous question
and find a code sequence of no more than 10 instructions where Tomasulo's algorithm
must stall due to CDB contention. Indicate where this occurs in your sequence.
3.17 [20] <3.3> An ( m , n ) correlating branch predictor uses the behavior of the most recent m
executed branches to choose from 2 m predictors, each of which is an n -bit predictor. A two-
level local predictor works in a similar fashion, but only keeps track of the past behavior of
each individual branch to predict future behavior.
There is a design trade-off involved with such predictors: Correlating predictors require
litle memory for history which allows them to maintain 2-bit predictors for a large number
of individual branches (reducing the probability of branch instructions reusing the same
predictor), while local predictors require substantially more memory to keep history and
are thus limited to tracking a relatively small number of branch instructions. For this ex-
ercise, consider a (1,2) correlating predictor that can track four branches (requiring 16
bits) versus a (1,2) local predictor that can track two branches using the same amount of
memory. For the following branch outcomes, provide each prediction, the table entry used
to make the prediction, any updates to the table as a result of the prediction, and the inal
misprediction rate of each predictor. Assume that all branches up to this point have been
taken. Initialize each predictor to the following:
Search WWH ::




Custom Search