25% to 90% of the runtime will be devoted to waiting for the memory bus. (You can find
programs that run entirely in cache and have zero percent bus waits, but they are the exceptions.)
There are two primary bus designs in use in SMP machines. There is the simple, direct-switched
bus such as the MBus, which was used in Sun's early SMP machines and the SPARCstation 10 s
and 20 s. Then there is the more expensive, more complex, packet-switched bus (a.k.a. split-
transaction bus) such as is used in all the server machines from all the manufacturers (Sun's
SPARCservers, Sun's Ultra series, SGI's Challenge series, HP's PA-RISC, IBM's POWERservers,
DEC's Alpha servers, HAL's Mercury series, Cray's S6400 series, etc.). In addition to these, there
are also crossbar switches that allow several CPUs to access several different memory banks
simultaneously (Sun's Ultra servers and SGI's Origin servers).
In a direct-switched bus (Figure 16-2), memory access is very simple. When CPU 0 wants to read
a word from main memory, it asserts bus ownership, makes the request, and waits until the data is
loaded. The sequence is:
Figure 16-2. Direct-Switched Memory Bus
1. CPU 0 takes a cache miss. E$ must now go out to main memory to load an entire cache
line (typically, 8 words).
2. CPU 0 asserts bus ownership (perhaps waiting for a current owner to release).
3. CPU 0 loads the desired address onto the bus address lines, then strobes out that address
on the address strobe line.
4. Memory sees the strobe, looks at the address, finds the proper memory bank, and then
starts looking for the data. DRAM is fairly slow and takes roughly a microsecond to
find the desired data.
Depending on when you're reading this topic!
5. Once found, memory puts the first set of words onto the bus's data lines and strobes it into
the E$. It then loads the next set of words, strobes that out, and continues until the entire
cache-line request has been satisfied.
The total bus transaction latency, from initial request to final transfer, is on the order of 1 µs for all
machines. It simply takes DRAM that long to find the data. Once found, DRAM can deliver the
data quite rapidly, upward of 60 ns per access, but the initial lookup is quite slow.
Search WWH :