Hardware Reference
In-Depth Information
lock. Here is the code sequence to lock a spin lock whose address is in R1 using an atomic ex-
change:
DADDUIR2,R0,#1
lockit: EXCHR2,0(R1) ;atomic exchange
BNEZR2,lockit ;already locked?
If our multiprocessor supports cache coherence, we can cache the locks using the coherence
mechanism to maintain the lock value coherently. Caching locks has two advantages. First, it
allows an implementation where the process of “spinning” (trying to test and acquire the lock
in a tight loop) could be done on a local cached copy rather than requiring a global memory
access on each atempt to acquire the lock. The second advantage comes from the observation
that there is often locality in lock accesses; that is, the processor that used the lock last will use
it again in the near future. In such cases, the lock value may reside in the cache of that pro-
cessor, greatly reducing the time to acquire the lock.
Obtaining the first advantage—being able to spin on a local cached copy rather than gener-
ating a memory request for each atempt to acquire the lock—requires a change in our simple
spin procedure. Each atempt to exchange in the loop directly above requires a write opera-
tion. If multiple processors are atempting to get the lock, each will generate the write. Most of
these writes will lead to write misses, since each processor is trying to obtain the lock variable
in an exclusive state.
Thus, we should modify our spin lock procedure so that it spins by doing reads on a local
copy of the lock until it successfully sees that the lock is available. Then it atempts to acquire
the lock by doing a swap operation. A proces-sor first reads the lock variable to test its state.
A processor keeps reading and testing until the value of the read indicates that the lock is un-
locked. The processor then races against all other processes that were similarly “spin waiting”
to see who can lock the variable first. All processes use a swap instruction that reads the old
value and stores a 1 into the lock variable. The single winner will see the 0, and the losers will
see a 1 that was placed there by the winner. (The losers will con-tinue to set the variable to the
locked value, but that doesn't mater.) The win-ning processor executes the code after the lock
and, when finished, stores a 0 into the lock variable to release the lock, which starts the race
all over again. Here is the code to perform this spin lock (remember that 0 is unlocked and 1 is
locked):
lockit: LDR2,0(R1) ;load of lock
BNEZR2,lockit ;not available-spin
DADDUIR2,R0,#1 ;load locked value
EXCHR2,0(R1) ;swap
BNEZR2,lockit ;branch if lock wasn't 0
Let's examine how this “spin lock” scheme uses the cache coherence mechanisms. Figure
5.24 shows the processor and bus or directory operations for multiple processes try-ing to lock
a variable using an atomic swap. Once the processor with the lock stores a 0 into the lock, all
other caches are invalidated and must fetch the new value to update their copy of the lock.
One such cache gets the copy of the unlocked value (0) first and performs the swap. When the
cache miss of other processors is satisfied, they find that the variable is already locked, so they
must return to testing and spinning.
Search WWH ::




Custom Search