In Code Example 16-2 we assume that memory location address_1 will not change between the
time we read it and the time we execute the StoreConditional. If it does change, we simply
loop back and try it again. This operation is equivalent to acquiring a lock, incrementing the word,
and releasing the lock, with the exception that it is impossible to go to sleep while holding the lock.
We cannot mix use of these instructions and normal mutex locks.
The advantage to these instructions is that they run roughly twice as fast as mutex-protected code
and there is no danger of being context switched in the middle of execution. The disadvantage is
that the operations you can perform are very simple and may not be sufficient to our purposes.
Inserting an element onto the front of a list is simple, but inserting elsewhere in the list is
impossible. (Yes, we can correctly change the next pointer of item_n, but item_n might have
been removed from the list while we were reading the next pointer!) For more general use we
need mutex locks.
The tricky part is that you can use these instructions to increment or decrement a variable
automatically, but you can't make any decisions based on "current" value because the variable's
value may change before you make your decision. In normal code you would make a decision
based on the value of a variable while in a critical section, so that the variable couldn't change.
You will sometimes see the use of these instructions referred to as lock-free synchronization.
All of this is quite interesting of course, but it is not applicable to Java, as you have no ability to
access such low-level instructions without going through JNI. Going through JNI would both
break the program's portability (it would not be a pure Java program anymore) and it would be
slow (going through JNI is expensive).
Lock-Free Semaphores and Reference Counting
Semaphores need to know if a decrement attempt succeeded or not. If successful, there is nothing
else for the semaphore to do. It's done (this will be our "fast path"--the most common case).
Should the semaphore value already be zero, a bit of careful programming will allow the thread to
go to sleep confident that the next sem_post() will wake it up. This means that sem_wait()
can execute in a single instruction (we don't even have to block out signals because no lock is
being held)! Calls to sem_post() will be somewhat more complex (they have to look for
sleepers) but still very fast.
Reference counting is one of the few other things that you can use such atomic instructions for,
because the only decision you make in reference counting occurs when the count hits zero. Once
zero, the reference count cannot be changed (there are no pointers left to the item to copy), hence
you can rely on this value.
Volatile: The Rest of the Story
At last we have the background we need to discuss volatile. As in C, volatile ensures that a
variable will not be cached in either registers or memory cache. Every read of that variable will go
out to the main memory bus, and every write will result in a write on the main memory bus.
Moreover, volatile will insert a store barrier after every write. This means that things like
Dekker's algorithm will work, even on an out-of-order execution machine.
Now, what does this give you? Very, very little. You can write something like Dekker's
algorithm to do locking instead of using locks. You can write data atomically and in order, but
for the same reasons as above, it's unlikely to give you want you want. Consider that every use of
a volatile variable requires a main memory read. Every main memory read costs upward of
100 cycles, whereas a simple direct nonvolatile use of the same variable would execute in one
Search WWH :