Thread-Level Parallelism - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

5.3 Performance of Symmetric Shared-Memory

Multiprocessors

In a multicore using a snooping coherence protocol, several different phenomena combine to

determine performance. In particular, the overall cache performance is a combination of the

behavior of uniprocessor cache miss traffic and the traffic caused by communication, which

results in invalidations and subsequent cache misses. Changing the processor count, cache

size, and block size can affect these two components of the miss rate in different ways, leading

to overall system behavior that is a combination of the two efects.

Appendix B breaks the uniprocessor miss rate into the three C's classification (capacity,

compulsory, and conflict) and provides insight into both application behavior and potential

improvements to the cache design. Similarly, the misses that arise from interprocessor com-

munication, which are often called coherence misses , can be broken into two separate sources.

The first source is the so-called true sharing misses that arise from the communication of data

through the cache coherence mechanism. In an invalidation-based protocol, the first write by a

processor to a shared cache block causes an invalidation to establish ownership of that block.

Additionally, when another processor atempts to read a modiied word in that cache block, a

miss occurs and the resultant block is transferred. Both these misses are classified as true shar-

ing misses since they directly arise from the sharing of data among processors.

The second effect, called false sharing , arises from the use of an invalidation-based coherence

algorithm with a single valid bit per cache block. False sharing occurs when a block is inval-

idated (and a subsequent reference causes a miss) because some word in the block, other than

the one being read, is writen into. If the word writen into is actually used by the processor

that received the invalidate, then the reference was a true sharing reference and would have

caused a miss independent of the block size. If, however, the word being written and the word

read are different and the invalidation does not cause a new value to be communicated, but

only causes an extra cache miss, then it is a false sharing miss. In a false sharing miss, the block

is shared, but no word in the cache is actually shared, and the miss would not occur if the

block size were a single word. The following example makes the sharing patterns clear.

Example

Assume that words x1 and x2 are in the same cache block, which is in the

shared state in the caches of both P1 and P2. Assuming the following sequence

of events, identify each miss as a true sharing miss, a false sharing miss, or a hit.

Any miss that would occur if the block size were one word is designated a true

sharing miss.

Search WWH ::

Custom Search

Home