Java Reference
In-Depth Information
seemingly complicated CAS operation. But in reality, CAS-based counters significantly out-
perform lock-based counters if there is even a small amount of contention, and often even
if there is no contention. The fast path for uncontended lock acquisition typically requires at
least one CAS plus other lock-related housekeeping, so more work is going on in the best
case for a lock-based counter than in the normal case for the CAS-based counter. Since the
CAS succeeds most of the time (assuming low to moderate contention), the hardware will
correctly predict the branch implicit in the while loop, minimizing the overhead of the more
complicated control logic.
The language syntax for locking may be compact, but the work done by the JVM and OS
to manage locks is not. Locking entails traversing a relatively complicated code path in the
JVM and may entail OS-level locking, thread suspension, and context switches. In the best
case, locking requires at least one CAS, so using locks moves the CAS out of sight but doesn't
save any actual execution cost. On the other hand, executing a CAS from within the program
involves no JVM code, system calls, or scheduling activity. What looks like a longer code
path at the application level is in fact a much shorter code path when JVM and OS activity
are taken into account. The primary disadvantage of CAS is that it forces the caller to deal
with contention (by retrying, backing off, or giving up), whereas locks deal with contention
automatically by blocking until the lock is available. [5]
CAS performance varies widely across processors. On a single-CPU system, a CAS typically
takes on the order of a handful of clock cycles, since no synchronization across processors
is necessary. As of this writing, the cost of an uncontended CAS on multiple CPU systems
ranges from about ten to about 150 cycles; CAS performance is a rapidly moving target and
varies not only across architectures but even across versions of the same processor. Com-
petitive forces will likely result in continued CAS performance improvement over the next
several years. A good rule of thumb is that the cost of the “fast path” for uncontended lock
acquisition and release on most processors is approximately twice the cost of a CAS.
15.2.3. CAS Support in the JVM
So, how does Java code convince the processor to execute a CAS on its behalf? Prior to
Java 5.0, there was no way to do this short of writing native code. In Java 5.0, low-level
support was added to expose CAS operations on int , long , and object references, and the
JVM compiles these into the most efficient means provided by the underlying hardware. On
platforms supporting CAS, the runtime inlines them into the appropriate machine instruc-
tion(s); in the worst case, if a CAS-like instruction is not available the JVM uses a spin
lock. This low-level JVMsupport is used by the atomic variable classes ( AtomicXxx in
java.util.concurrent. atomic ) to provide an efficient CAS operation on numeric
Search WWH ::




Custom Search