Hardware Reference
In-Depth Information
The pair of instructions includes a special load called a load linked or load locked and a special
store called a store conditional . These instructions are used in sequence: If the contents of the
memory location specified by the load linked are changed before the store conditional to the
same address occurs, then the store conditional fails. If the processor does a context switch
between the two instructions, then the store conditional also fails. The store conditional is
deined to return 1 if it was successful and a 0 otherwise. Since the load linked returns the
initial value and the store conditional returns 1 only if it succeeds, the following sequence im-
plements an atomic exchange on the memory location specified by the contents of R1:
try: MOV R3,R4 ;mov exchange value
LL R2,0(R1);load linked
SC R3,0(R1);store conditional
BEQZ R3,try ;branch store fails
MOV R4,R2 ;put load value in R4
At the end of this sequence the contents of R4 and the memory location speci-fied by R1 have
been atomically exchanged (ignoring any effect from delayed branches). Anytime a processor
intervenes and modifies the value in memory between the LL and SC instructions, the SC returns
0 in R3 , causing the code sequence to try again.
An advantage of the load linked/store conditional mechanism is that it can be used to build
other synchronization primitives. For example, here is an atomic fetch-and-increment:
try: LL R2,0(R1) ;load linked
DADDUI R3,R2,#1 ;increment
SC R3,0(R1) ;store conditional
BEQZ R3,try ;branch store fails
These instructions are typically implemented by keeping track of the address specified in
the LL instruction in a register, often called the link register . If an interrupt occurs, or if the cache
block matching the address in the link register is invalidated (for example, by another SC ), the
link register is cleared. The SC instruction simply checks that its address matches that in the
link register. If so, the SC succeeds; otherwise, it fails. Since the store conditional will fail after
either another atempted store to the load linked address or any exception, care must be taken
in choosing what instructions are inserted between the two instructions. In particular, only
register-register instructions can safely be permited; otherwise, it is possible to create dead-
lock situations where the processor can never complete the SC . In addition, the number of in-
structions between the load linked and the store conditional should be small to minimize the
probability that either an unrelated event or a competing processor causes the store condition-
fail to fail frequently.
Implementing Locks Using Coherence
Once we have an atomic operation, we can use the coherence mechanisms of a multiprocessor
to implement spin locks —locks that a processor continuously tries to acquire, spinning around
a loop until it succeeds. Spin locks are used when programmers expect the lock to be held for a
very short amount of time and when they want the process of locking to be low latency when
the lock is available. Because spin locks tie up the processor, waiting in a loop for the lock to
become free, they are inappropriate in some circumstances.
The simplest implementation, which we would use if there were no cache coherence, would
be to keep the lock variables in memory. A processor could continually try to acquire the lock
using an atomic operation, say, atomic exchange from page 387, and test whether the exchange
returned the lock as free. To release the lock, the processor simply stores the value 0 to the
Search WWH ::




Custom Search