Hardware Reference
In-Depth Information
Time P1
P2
1
Write x1
2
Read x2
3
Write x1
4
Write x2
5
Read x2
Answer
Here are the classifications by time step:
1. This event is a true sharing miss, since x1 was read by P2 and needs to be
invalidated from P2.
2. This event is a false sharing miss, since x2 was invalidated by the write of
x1 in P1, but that value of x1 is not used in P2.
3. This event is a false sharing miss, since the block containing x1 is marked
shared due to the read in P2, but P2 did not read x1. The cache block con-
taining x1 will be in the shared state after the read by P2; a write miss is re-
quired to obtain exclusive access to the block. In some protocols this will be
handled as an upgrade request , which generates a bus invalidate, but does
not transfer the cache block.
4. This event is a false sharing miss for the same reason as step 3.
5. This event is a true sharing miss, since the value being read was writen by
P2.
Although we will see the effects of true and false sharing misses in commercial workloads,
the role of coherence misses is more significant for tightly coupled applications that share sig-
niicant amounts of user data. We examine their effects in detail in Appendix I, when we con-
sider the performance of a parallel scientific workload.
A Commercial Workload
In this section, we examine the memory system behavior of a four-processor shared-memory
multiprocessor when running a general-purpose commercial workload. The study we exam-
ine was done with a four-processor Alpha system in 1998, but it remains the most comprehens-
ive and insightful study of the performance of a multiprocessor for such workloads. The res-
ults were collected either on an Alpha-Server 4100 or using a configurable simulator modeled
after the AlphaServer 4100. Each processor in the AlphaServer 4100 is an Alpha 21164, which
issues up to four instructions per clock and runs at 300 MHz. Although the clock rate of the
Alpha processor in this system is considerably slower than processors in systems designed in
2011, the basic structure of the system, consisting of a four-issue processor and a three-level
Search WWH ::




Custom Search