Hardware Reference
In-Depth Information
finished. However, if the request was for writing, an invalidation message must be
sent to all other boards (if any) holding a copy of it. In this way, the board making
the write request ends up with the only copy.
Now consider the case in which the requested block is in exclusive state locat-
ed on a different board. When the home board gets the request, it looks up the lo-
cation of the remote board in the directory and sends the requester a message tel-
ling where the cache line is. The requester now sends the request to the correct
boardset. When the request arrives, the board sends back the cache line. If it was
a read request, the line is marked shared and a copy sent back to the home board.
If it was a write request, the responder invalidates its copy so the new requester has
an exclusive copy.
Since each board has 2 29 memory blocks, it would take a directory with 2 29 en-
tries to keep track of them all in the worst case. Since the directory is much smal-
ler than 2 29 , it could happen that there is no room in the directory (which is
searched associatively) for some entries. In this case, the home directory has to
locate the block by broadcasting a request for it to all the other 17 boards. The re-
sponse crossbar switch plays a role in the directory coherence and update protocol
by handling much of the reverse traffic back to the sender. Splitting the protocol
traffic over two buses (address and response) and the data over a third bus in-
creases the throughput of the system.
By distributing the load over multiple devices on different boards, the Sun Fire
E25K is able to achieve very high performance. In addition to the 2.7 billion
snoops/sec mentioned above, the centerplane can handle up to nine simultaneous
transfers, with nine boards sending and nine boards receiving. Since the data
crossbar is 32 bytes wide, on every clock cycle 288 bytes can be moved through
the centerplane. At a clock rate of 150 MHz, this gives a peak aggregate band-
width of 40 GB/sec when all accesses are remote. If the software can place pages
in such a way to ensure that most accesses are local, then the system bandwidth
can be appreciably higher than 40 GB/sec.
For more technical information about the Sun Fire, see Charlesworth (2002)
and Charlesworth (2001).
In 2009 Oracle purchased Sun Microsystems, and they have continued devel-
opment of SPARC-based servers. The SPARC Enterprise M9000 is the successor
to the E25K. The M9000 incorporates faster quad-core SPARC processors, plus
additional memory and PCIe slots. A fully equipped M9000 server contains 256
SPARC processors, 4 TB of DRAM, and 128 PCIe I/O interfaces.
8.3.5 COMA Multiprocessors
NUMA and CC-NUMA machines have the disadvantage that references to re-
mote memory are much slower than those to local memory. In CC-NUMA, this
performance difference is hidden to some extent by the caching. Nevertheless, if
 
 
Search WWH ::




Custom Search