Thread-Level Parallelism - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

tions: one unlock (after the write) and one lock (before the read). Of course, if two processors

are writing a variable with no intervening reads, then the writes must also be separated by

synchronization operations.

It is a broadly accepted observation that most programs are synchronized. This observation

is true primarily because if the accesses were unsynchronized, the behavior of the program

would likely be unpredictable because the speed of execution would determine which pro-

cessor won a data race and thus affect the results of the program. Even with sequential con-

sistency, reasoning about such programs is very diicult.

Programmers could atempt to guarantee ordering by constructing their own synchroniz-

ation mechanisms, but this is extremely tricky, can lead to buggy programs, and may not be

supported architecturally, meaning that they may not work in future generations of the mul-

tiprocessor. Instead, almost all programmers will choose to use synchronization libraries that

are correct and optimized for the multiprocessor and the type of synchronization.

Finally, the use of standard synchronization primitives ensures that even if the architecture

implements a more relaxed consistency model than sequential consistency, a synchronized

program will behave as if the hardware implemented sequential consistency.

Relaxed Consistency Models: The Basics

The key idea in relaxed consistency models is to allow reads and writes to complete out of

order, but to use synchronization operations to enforce ordering, so that a synchronized pro-

gram behaves as if the processor were sequentially consistent. There are a variety of relaxed

models that are classified according to what read and write orderings they relax. We specify

the orderings by a set of rules of the form X→Y, meaning that operation X must complete be-

fore operation Y is done. Sequential consistency requires maintaining all four possible order-

ings: R→W, R→R, W→R, and W→W. The relaxed models are defined by which of these four

sets of orderings they relax:

1. Relaxing the W→R ordering yields a model known as total store ordering or processor con-

sistency . Because this ordering retains ordering among writes, many programs that operate

under sequential consistency operate under this model, without additional synchroniza-

tion.

2. Relaxing the W→W ordering yields a model known as partial store order .

3. Relaxing the R→W and R→R orderings yields a variety of models including weak ordering ,

the PowerPC consistency model, and release consistency , depending on the details of the or-

dering restrictions and how synchronization operations enforce ordering.

By relaxing these orderings, the processor can possibly obtain significant performance advant-

ages. There are, however, many complexities in describing relaxed consistency models, in-

cluding the advantages and complexities of relaxing different orders, defining precisely what

it means for a write to complete, and deciding when processors can see values that the pro-

cessor itself has writen. For more information about the complexities, implementation issues,

and performance potential from relaxed models, we highly recommend the excellent tutorial

by Adve and Gharachorloo [1996] .

Final Remarks On Consistency Models

At the present time, many multiprocessors being built support some sort of relaxed consist-

ency model, varying from processor consistency to release consistency. Since synchronization

is highly multiprocessor specific and error prone, the expectation is that most programmers

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home