Dekker's algorithm is basically a clever way of building locks for a known number of threads
based on the assumption that writes to main memory arrive in order. Without a volatile declaration,
it is not possible to implement on an out-of-order machine.
The definition of volatile does imply that 64-bit data (e.g., doubles and longs) will be treated
atomically. On a machine that does 64-bit writes (UltraSPARC, Alpha, etc.), this is
straightforward to implement. On a machine that only does 32-bit writes (SPARC v7, x86, etc.)
this is a bit more difficult. To meet the definition of volatile, it is actually necessary for 32-bit
machines to maintain a lock specifically for 64-bit volatile data.
It is unlikely that you will ever use volatile at all. Be careful!
In the 2nd edition of his topic, Doug Lea (see Appendix B) is including a simple example of using
volatile to manage a list. The code is amazingly complex for such a simple problem. I figure that if
it's that difficult for Doug to get it right, I don't want to be doing it at all!
Atomic Reads and Writes
Most writes on most systems are either 32 or 64 bits wide, aligned on the appropriate word
boundary. Such writes are atomic--you never have to be concerned that the first 16 bits have been
written before the second 16 bits. What good does this do you? Very little. You cannot test a value
because it may change. You cannot increment a value because someone else might be using it at
the same time. You may be able to figure out a tricky way of combining this fact with a
volatile declaration to allow you to avoid using a lock, but it probably won't make your
program go any faster.
In Win32 there are a set of interlocked functions [InterlockedIncrement(),
InterlockedDecrement(), and InterlockedExchange()] which call the respective
interlocked instructions on x86 machines. These instructions provide equivalent functionality to
the fancy synchronization instructions we just looked at. Although some Win32 programmers will
gladly attest to their glory, they don't actually provide the C program with much value, and they
cannot be called from Java. Use locks!
The memory system in modern SMP machines is designed to be ignored. You shouldn't have to
spend any time thinking about it--it should just work. And it succeeds in this, to a degree. As long
as you are writing a program that is reasonably well behaved and that doesn't have overwhelming
needs for absolute maximum performance, you can skip over this section. Probably 95% of all
programs fit into this category. As for the other 5% percent...
In 1980, memory speeds were about the same as CPU speeds, and a machine could access main
memory in a single cycle. Since then, DRAM speeds have improved by an order of magnitude and
CPU speeds by almost four. Direct main memory access now costs between 30 and 100 CPU
cycles. It is not at all unusual for a CPU to spend over half its time stalled, waiting for memory.
To the degree that you can reduce the number of main memory accesses (i.e., cache misses), you
will be handsomely paid in program performance. (N.B.: There is nothing unique to MP machines
or MT programs here.)
Reducing Cache Misses
Search WWH :