Hardware Reference
In-Depth Information
Each boardset consists of three boards: the CPU-memory board, the I/O board,
and the expander board, which connects the other two. The level 2 interconnect is
another 3
3 crossbar switch (on the expander board) that joins the actual memory
to the I/O ports (which are memory mapped on all UltraSPARCs). All data trans-
fers to or from the boardset, whether to memory or to an I/O port, pass through the
level 2 switch. Finally, data that have to be transferred to or from a remote board
pass through an 18
×
18 data crossbar switch at level 3. Data transfers are done 32
bytes at a time, so it takes two clock cycles to transfer 64 bytes, the usual transfer
unit.
Having looked at how the components are arranged, let us now consider how
the shared memory operates. At the bottom level, the 576 GB of memory is split
into 2 29 blocks of 64 bytes each. These blocks are the atomic units of the memory
system. Each block has a home board where it lives when not in use elsewhere.
Most blocks are on their home board most of the time. However, when a CPU
needs a memory block, either from its own board or one of the 17 remote ones, it
first requests a copy for its own cache, then accesses the cached copy. Although
each CPU chip on the E25K contains two CPUs, they share a single physical cache
and thus share all the blocks contained in it.
Each memory block and cache line of each CPU chip can be in one of three
states:
1. Exclusive access (for writing).
2. Shared access (for reading).
3.
×
Invalid (i.e., empty).
When a CPU needs to read or write a memory word, it first checks its own
cache. Failing to find the word there, it issues a local request for the physical ad-
dress that is broadcast only on its own boardset. If a cache on the boardset has the
needed line, the snooping logic detects the hit and responds to the request. If the
line is in exclusive mode, it is transferred to the requester and the original copy
marked invalid. If it is in shared mode, the cache does not respond since memory
always responds when a cache line is clean.
If the snooping logic cannot find the cache line or it is present and shared, it
sends a request over the centerplane to the home board asking where the memory
block is. The state of each memory block is stored in the block's ECC bits, so the
home board can immediately determine its state. If the block is either unshared or
shared with one or more remote boards, the home memory will be up to date, and
the request can be satisfied from the home board's memory. In this case, a copy of
the case line is transmitted over the data crossbar switch in two clock cycles, even-
tually arriving at the requesting CPU.
If the request was for reading, an entry is made in the directory at the home
board noting that a new customer is sharing the cache line and the transaction is
Search WWH ::




Custom Search