Hardware Reference
In-Depth Information
19. Reconsider the previous problem. Are there any states that could be collapsed without
changing the result of any game? If so, which ones are equivalent?
20. Draw a finite-state machine for branch prediction that is more tenacious than Fig. 4-42.
It should change only predictions after three consecutive mispredictions.
21. The shift register of Fig. 4-27 has a maximum capacity of 6 bytes. Could a cheaper
version of the IFU be built with a 5-byte shift register? How about a 4-byte one?
22. Having examined cheaper IFUs in the previous question, now let us examine more ex-
pensive ones. Would there ever be any point to have a much larger shift register in the
IU, say, 12 bytes? Why or why not?
23. In the microprogram for the Mic-2, the code for if icmpeq6 goes to T when Z is set to
1. However, the code at T is the same as goto1 . Would it have been possible to go to
goto1 directly? Would doing so have made the machine faster?
24. In the Mic-4, the decoding unit maps the IJVM opcode onto the ROM index where the
corresponding micro-operations are stored. It would seem to be simpler to just omit
the decoding stage and feed the IJVM opcode into the queueing directly. It could use
the IJVM opcode as an index into the ROM, the same way as the Mic-1 works. What
is wrong with this plan?
25. Why are computers equipped with multiple layers of cache? Would it not be better to
simply have one big one?
26. A computer has a two-level cache. Suppose that 60% of the memory references hit on
the first level cache, 35% hit on the second level, and 5% miss. The access times are 5
nsec, 15 nsec, and 60 nsec, respectively, where the times for the level 2 cache and
memory start counting at the moment it is known that they are needed (e.g., a level 2
cache access does not even start until the level 1 cache miss occurs). What is the aver-
age access time?
27. At the end of Sec. 4.5.1, we said that write allocation wins only if there are likely to be
multiple writes to the same cache line in a row. What about the case of a write follow-
ed by multiple reads? Would that not also be a big win?
28. In the first draft of this topic, Fig. 4-39 showed a three-way associative cache instead
of a four-way associative cache. One of the reviewers threw a temper tantrum, claim-
ing that students would be horribly confused by this because 3 is not a power of 2 and
computers do everything in binary. Since the customer is always right, the figure was
changed to a four-way associative cache. Was the reviewer right? Discuss your
answer.
29. Many computer architects spend a lot of time making their pipelines deeper. Why?
30. A computer with a five-stage pipeline deals with conditional branches by stalling for
the next three cycles after hitting one. How much does stalling hurt the performance if
20% of all instructions are conditional branches? Ignore all sources of stalling except
conditional branches.
31. A computer prefetches up to 20 instructions in advance. However, on the average, four
of these are conditional branches, each with a probability of 90% of being predicted
correctly. What is the probability that the prefetching is on the right track?
Search WWH ::




Custom Search