Hardware Reference
In-Depth Information
system is called CC-NUMA (at least by the hardware people). The software peo-
ple often call it hardware DSM because it is basically the same as software dis-
tributed shared memory but implemented by the hardware using a small page size.
One of the first NC-NUMA machines (although the name had not yet been
coined) was the Carnegie-Mellon Cm*, illustrated in simplified form in Fig. 8-32
(Swan et al., 1977). It consisted of a collection of LSI-11 CPUs, each with some
memory addressed over a local bus. (The LSI-11 was a single-chip version of the
DEC PDP-11, a minicomputer popular in the 1970s.) In addition, the LSI-11 sys-
tems were connected by a system bus. When a memory request came into the
(specially modified) MMU, a check was made to see if the word needed was in the
local memory. If so, a request was sent over the local bus to get the word. If not,
the request was routed over the system bus to the system containing the word,
which then responded. Of course, the latter took much longer than the former.
While a program could run happily out of remote memory, it took 10 times longer
to execute than the same program running out of local memory.
CPU
Memory
CPU Memory
CPU Memory
CPU Memory
MMU
Local bus
Local bus
Local bus
Local bus
System bus
Figure 8-32. A NUMA machine based on two levels of buses. The Cm* was the
first multiprocessor to use this design.
Memory coherence is guaranteed in an NC-NUMA machine because no cach-
ing is present. Each word of memory lives in exactly one location, so there is no
danger of one copy having stale data: there are no copies. Of course, it now mat-
ters a great deal which page is in which memory because the performance penalty
for being in the wrong place is so high. Consequently, NC-NUMA machines use
elaborate software to move pages around to maximize performance.
Typically, a daemon process called a page scanner runs every few seconds.
Its job is to examine the usage statistics and move pages around in an attempt to
improve performance. If a page appears to be in the wrong place, the page scanner
unmaps it so that the next reference to it will cause a page fault. When the fault
occurs, a decision is made about where to place the page, possibly in a different
memory. To prevent thrashing, usually there is some rule saying that once a page
is placed, it is frozen in place for a time
T . Various algorithms have been studied,
but the conclusion is that no one algorithm performs best under all circumstances
(LaRowe and Ellis, 1991). Best performance depends on the application.
Δ
 
Search WWH ::




Custom Search