Hardware Reference
In-Depth Information
FIGURE 5.8 A multicore single-chip multiprocessor with uniform memory access
through a banked shared cache and using an interconnection network rather than a
bus .
The AMD Opteron represents another intermediate point in the spectrum between a snoop-
ing and a directory protocol. Memory is directly connected to each multicore chip, and up to
four multicore chips can be connected. The system is a NUMA, since local memory is some-
what faster. The Opteron implements its coherence protocol using the point-to-point links to
broadcast up to three other chips. Because the interprocessor links are not shared, the only
way a processor can know when an invalid operation has completed is by an explicit acknow-
ledgment. Thus, the coherence protocol uses a broadcast to find potentially shared copies, like
a snooping protocol, but uses the acknowledgments to order operations, like a directory pro-
tocol. Because local memory is only somewhat faster than remote memory in the Opteron im-
plementation, some software treats an Opteron multiprocessor as having uniform memory ac-
cess.
A snooping cache coherence protocol can be used without a centralized bus, but still re-
quires that a broadcast be done to snoop the individual caches on every miss to a potentially
shared cache block. This cache coherence traffic creates another limit on the scale and the
speed of the processors. Because coherence traffic is unaffected by larger caches, faster pro-
cessors will inevitably overwhelm the network and the ability of each cache to respond to
snoop requests from all the other caches. In Section 5.4 , we examine directory-based protocols,
 
Search WWH ::




Custom Search