Instruction-Level Parallelism and Its Exploitation - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

a. [20] <3.4-3.5> For this problem use the single-issue Tomasulo MIPS pipeline of Figure

3.6 with the pipeline latencies from the table above. Show the number of stall cycles

for each instruction and what clock cycle each instruction begins execution (i.e., enters

its first EX cycle) for three iterations of the loop. How many cycles does each loop iter-

ation take? Report your answer in the form of a table with the following column head-

ers:

■ Iteration (loop iteration number)

■ Instruction

■ Issues (cycle when instruction issues)

■ Executes (cycle when instruction executes)

■ Memory access (cycle when memory is accessed)

■ Write CDB (cycle when result is writen to the CDB)

■ Comment (description of any event on which the instruction is waiting)

Show three iterations of the loop in your table. You may ignore the first instruction.

b. [20] <3.7, 3.8> Repeat part (a) but this time assume a two-issue Tomasulo algorithm

and a fully pipelined floating-point unit (FPU).

3.16 [10] <3.4> Tomasulo's algorithm has a disadvantage: Only one result can compute per

clock per CDB. Use the hardware configuration and latencies from the previous question

and find a code sequence of no more than 10 instructions where Tomasulo's algorithm

must stall due to CDB contention. Indicate where this occurs in your sequence.

3.17 [20] <3.3> An ( m , n ) correlating branch predictor uses the behavior of the most recent m

executed branches to choose from 2 m predictors, each of which is an n -bit predictor. A two-

level local predictor works in a similar fashion, but only keeps track of the past behavior of

each individual branch to predict future behavior.

There is a design trade-off involved with such predictors: Correlating predictors require

litle memory for history which allows them to maintain 2-bit predictors for a large number

of individual branches (reducing the probability of branch instructions reusing the same

predictor), while local predictors require substantially more memory to keep history and

are thus limited to tracking a relatively small number of branch instructions. For this ex-

ercise, consider a (1,2) correlating predictor that can track four branches (requiring 16

bits) versus a (1,2) local predictor that can track two branches using the same amount of

memory. For the following branch outcomes, provide each prediction, the table entry used

to make the prediction, any updates to the table as a result of the prediction, and the inal

misprediction rate of each predictor. Assume that all branches up to this point have been

taken. Initialize each predictor to the following:

Search WWH ::

Custom Search

Home