Hardware Reference
In-Depth Information
13. As a simple model of a bus-based multiprocessor system without caching, suppose that
one instruction in every four references memory, and that a memory reference occupies
the bus for an entire instruction time. If the bus is busy, the requesting CPU is put into
a FIFO queue. How much faster will a 64-CPU system run than a 1-CPU system?
14. The MESI cache coherence protocol has four states. Other write-back cache coherence
protocols have only three states. Which of the four MESI states could be sacrificed,
and what would the consequences of each choice be? If you had to pick only three
states, which would you pick?
15. Are there any situations with the MESI cache coherence protocol in which a cache line
is present in the local cache but for which a bus transaction is nevertheless needed? If
so, explain.
16. Suppose that there are n CPUs on a common bus. The probability that any CPU tries
to use the bus in a given cycle is p . What is the chance that
a. The bus is idle (0 requests).
b. Exactly one request is made.
c. More than one request is made.
17. Name the major advantage and the major disadvantage of a crossbar switch.
18. How many crossbar switches does a full Sun Fire E25K have?
19. Suppose that the wire between switch 2A and switch 3B in the omega network of
Fig. 8-31 breaks. Who is cut off from whom?
20. Hot spots (heavily referenced memory locations) are clearly a major problem in multi-
stage switching networks. Are they also a problem in bus-based systems?
21. An omega switching network connects 4096 RISC CPUs, each with a 60-nsec cycle
time, to 4096 infinitely fast memory modules. The switching elements each have a
5-nsec delay. How many delay slots are needed by a LOAD instruction?
22. Consider a machine using an omega switching network, like the one shown in
Fig. 8-31. Suppose that the program and stack for processor i are kept in memory
module i . Propose a slight change in the topology that makes a large difference in the
performance (the IBM RP3 and BBN Butterfly use this modified topology). What
disadvantage does your new topology have compared to the original?
23. In a NUMA multiprocessor, local memory references take 20 nsec and remote refer-
ences 120 nsec. A certain program makes a total of N memory references during its
execution, of which 1 percent are to a page P . That page is initially remote, and it
takes C nsec to copy it locally. Under what conditions should the page be copied
locally in the absence of significant use by other processors?
24. Consider a CC-NUMA multiprocessor like that of Fig. 8-33 except with 512 nodes of
8 MB each. If the cache lines are 64 bytes, what is the percentage overhead for the di-
rectories? Does increasing the number of nodes increase the overhead, decrease the
overhead, or leave it unchanged?
25. What is the difference between NC-NUMA and CC-NUMA?
26. For each topology shown in Fig. 8-37, compute the diameter of the network.
Search WWH ::




Custom Search