Thread-Level Parallelism - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

Basic Hardware Primitives

The key ability we require to implement synchronization in a multiprocessor is a set of hard-

ware primitives with the ability to atomically read and modify a memory location. Without

such a capability, the cost of building basic synchronization primitives will be too high and

will increase as the processor count increases. There are a number of alternative formulations

of the basic hardware primitives, all of which provide the ability to atomically read and modi-

fy a location, together with some way to tell if the read and write were performed atomically.

These hardware primitives are the basic building blocks that are used to build a wide variety

of user-level synchronization operations, including things such as locks and barriers. In gener-

all architects do not expect users to employ the basic hardware primitives, but instead expect

that the primitives will be used by system programmers to build a synchronization library,

a process that is often complex and tricky. Let's start with one such hardware primitive and

show how it can be used to build some basic synchronization operations.

One typical operation for building synchronization operations is the atomic exchange , which

interchanges a value in a register for a value in memory. To see how to use this to build a basic

synchronization operation, assume that we want to build a simple lock where the value 0 is

used to indicate that the lock is free and 1 is used to indicate that the lock is unavailable. A pro-

cessor tries to set the lock by doing an exchange of 1, which is in a register, with the memory

address corresponding to the lock. The value returned from the exchange instruction is 1 if

some other processor had already claimed access and 0 otherwise. In the latter case, the value

is also changed to 1, preventing any competing exchange from also retrieving a 0.

For example, consider two processors that each try to do the exchange simultaneously: This

race is broken since exactly one of the processors will perform the exchange first, returning 0,

and the second processor will return 1 when it does the exchange. The key to using the ex-

change (or swap) primitive to implement synchronization is that the operation is atomic: The

exchange is indivisible, and two simultaneous exchanges will be ordered by the write serial-

ization mechanisms. It is impossible for two processors trying to set the synchronization vari-

able in this manner to both think they have simultaneously set the variable.

There are a number of other atomic primitives that can be used to implement synchroniz-

ation. They all have the key property that they read and update a memory value in such a

manner that we can tell whether or not the two operations executed atomically. One opera-

tion, present in many older multiprocessors, is test-and-set , which tests a value and sets it if the

value passes the test. For example, we could define an operation that tested for 0 and set the

value to 1, which can be used in a fashion similar to how we used atomic exchange. Another

atomic synchronization primitive is fetch-and-increment : It returns the value of a memory loc-

ation and atomically increments it. By using the value 0 to indicate that the synchronization

variable is unclaimed, we can use fetch-and-increment, just as we used exchange. There are

other uses of operations like fetch-and-increment, which we will see shortly.

Implementing a single atomic memory operation introduces some challenges, since it re-

quires both a memory read and a write in a single, uninterruptible instruction. This require-

ment complicates the implementation of coherence, since the hardware cannot allow any other

operations between the read and the write, and yet must not deadlock.

An alternative is to have a pair of instructions where the second instruction returns a value

from which it can be deduced whether the pair of instructions was executed as if the instruc-

tions were atomic. The pair of instructions is effectively atomic if it appears as if all other op-

erations executed by any processor occurred before or after the pair. Thus, when an instruc-

tion pair is effectively atomic, no other processor can change the value between the instruction

pair.

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home