Thread-Level Parallelism - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

FIGURE 5.7 Cache coherence state diagram with the state transitions induced by the

local processor shown in black and by the bus activities shown in gray . As in Figure 5.6 ,

the activities on a transition are shown in bold.

Although our simple cache protocol is correct, it omits a number of complications that make

the implementation much trickier. The most important of these is that the protocol assumes

that operations are atomic —that is, an operation can be done in such a way that no interven-

ing operation can occur. For example, the protocol described assumes that write misses can

be detected, acquire the bus, and receive a response as a single atomic action. In reality this

is not true. In fact, even a read miss might not be atomic; after detecting a miss in the L2 of a

multicore, the core must arbitrate for access to the bus connecting to the shared L3. Nonatomic

actions introduce the possibility that the protocol can deadlock , meaning that it reaches a state

where it cannot continue. We will explore these complications later in this section and when

we examine DSM designs.

With multicore processors, the coherence among the processor cores is all implemented on

chip, using either a snooping or simple central directory protocol. Many dual-processor chips,

including the Intel Xeon and AMD Opteron, supported multichip multiprocessors that could

be built by connecting a high-speed interface (called Quickpath or Hypertransport, respect-

Search WWH ::

Custom Search

Home