Direct-Switched Buses - Figure 16-2. Direct-Switched Memory Bus - Multithreaded Programming with JAVA

25% to 90% of the runtime will be devoted to waiting for the memory bus. (You can find

programs that run entirely in cache and have zero percent bus waits, but they are the exceptions.)

There are two primary bus designs in use in SMP machines. There is the simple, direct-switched

bus such as the MBus, which was used in Sun's early SMP machines and the SPARCstation 10 s

and 20 s. Then there is the more expensive, more complex, packet-switched bus (a.k.a. split-

transaction bus) such as is used in all the server machines from all the manufacturers (Sun's

SPARCservers, Sun's Ultra series, SGI's Challenge series, HP's PA-RISC, IBM's POWERservers,

DEC's Alpha servers, HAL's Mercury series, Cray's S6400 series, etc.). In addition to these, there

are also crossbar switches that allow several CPUs to access several different memory banks

simultaneously (Sun's Ultra servers and SGI's Origin servers).

Direct-Switched Buses

In a direct-switched bus (Figure 16-2), memory access is very simple. When CPU 0 wants to read

a word from main memory, it asserts bus ownership, makes the request, and waits until the data is

loaded. The sequence is:

Figure 16-2. Direct-Switched Memory Bus

1. CPU 0 takes a cache miss. E$ must now go out to main memory to load an entire cache

line (typically, 8 words).

2. CPU 0 asserts bus ownership (perhaps waiting for a current owner to release).

3. CPU 0 loads the desired address onto the bus address lines, then strobes out that address

on the address strobe line.

4. Memory sees the strobe, looks at the address, finds the proper memory bank, and then

starts looking for the data. DRAM is fairly slow and takes roughly a microsecond[7] to

find the desired data.

[7]

Depending on when you're reading this topic!

5. Once found, memory puts the first set of words onto the bus's data lines and strobes it into

the E$. It then loads the next set of words, strobes that out, and continues until the entire

cache-line request has been satisfied.

The total bus transaction latency, from initial request to final transfer, is on the order of 1 µs for all

machines. It simply takes DRAM that long to find the data. Once found, DRAM can deliver the

data quite rapidly, upward of 60 ns per access, but the initial lookup is quite slow.

Search WWH :

Custom Search