Thread-Level Parallelism - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

The pair of instructions includes a special load called a load linked or load locked and a special

store called a store conditional . These instructions are used in sequence: If the contents of the

memory location specified by the load linked are changed before the store conditional to the

same address occurs, then the store conditional fails. If the processor does a context switch

between the two instructions, then the store conditional also fails. The store conditional is

deined to return 1 if it was successful and a 0 otherwise. Since the load linked returns the

initial value and the store conditional returns 1 only if it succeeds, the following sequence im-

plements an atomic exchange on the memory location specified by the contents of R1:

try: MOV R3,R4 ;mov exchange value

LL R2,0(R1);load linked

SC R3,0(R1);store conditional

BEQZ R3,try ;branch store fails

MOV R4,R2 ;put load value in R4

At the end of this sequence the contents of R4 and the memory location speci-fied by R1 have

been atomically exchanged (ignoring any effect from delayed branches). Anytime a processor

intervenes and modifies the value in memory between the LL and SC instructions, the SC returns

0 in R3 , causing the code sequence to try again.

An advantage of the load linked/store conditional mechanism is that it can be used to build

other synchronization primitives. For example, here is an atomic fetch-and-increment:

try: LL R2,0(R1) ;load linked

DADDUI R3,R2,#1 ;increment

SC R3,0(R1) ;store conditional

BEQZ R3,try ;branch store fails

These instructions are typically implemented by keeping track of the address specified in

the LL instruction in a register, often called the link register . If an interrupt occurs, or if the cache

block matching the address in the link register is invalidated (for example, by another SC ), the

link register is cleared. The SC instruction simply checks that its address matches that in the

link register. If so, the SC succeeds; otherwise, it fails. Since the store conditional will fail after

either another atempted store to the load linked address or any exception, care must be taken

in choosing what instructions are inserted between the two instructions. In particular, only

register-register instructions can safely be permited; otherwise, it is possible to create dead-

lock situations where the processor can never complete the SC . In addition, the number of in-

structions between the load linked and the store conditional should be small to minimize the

probability that either an unrelated event or a competing processor causes the store condition-

fail to fail frequently.

Implementing Locks Using Coherence

Once we have an atomic operation, we can use the coherence mechanisms of a multiprocessor

to implement spin locks —locks that a processor continuously tries to acquire, spinning around

a loop until it succeeds. Spin locks are used when programmers expect the lock to be held for a

very short amount of time and when they want the process of locking to be low latency when

the lock is available. Because spin locks tie up the processor, waiting in a loop for the lock to

become free, they are inappropriate in some circumstances.

The simplest implementation, which we would use if there were no cache coherence, would

be to keep the lock variables in memory. A processor could continually try to acquire the lock

using an atomic operation, say, atomic exchange from page 387, and test whether the exchange

returned the lock as free. To release the lock, the processor simply stores the value 0 to the

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home