Java Reference
In-Depth Information
The actual cost of context switching varies across platforms, but a good rule of thumb is that
a context switch costs the equivalent of 5,000 to 10,000 clock cycles, or several microseconds
on most current processors.
The
vmstat
command on Unix systems and the
perfmon
tool on Windows systems report
the number of context switches and the percentage of time spent in the kernel. High kernel
usage (over 10%) often indicates heavy scheduling activity, which may be caused by block-
ing due to I/O or lock contention.
11.3.2. Memory Synchronization
The performance cost of synchronization comes from several sources. The visibility guar-
antees provided by
synchronized
and
volatile
may entail using special instructions
called
memory barriers
that can flush or invalidate caches, flush hardware write buffers, and
stall execution pipelines. Memory barriers may also have indirect performance consequences
because they inhibit other compiler optimizations; most operations cannot be reordered with
memory barriers.
When assessing the performance impact of synchronization, it is important to distinguish
between
contended
and
uncontended
synchronization. The
synchronized
mechanism is
optimized for the uncontended case (
volatile
is always uncontended), and at this writ-
ing, the performance cost of a “fast-path” uncontended synchronization ranges from 20 to
250 clock cycles for most systems. While this is certainly not zero, the effect of needed, un-
contended synchronization is rarely significant in overall application performance, and the
alternative involves compromising safety and potentially signing yourself (or your successor)
up for some very painful bug hunting later.
Modern JVMs can reduce the cost of incidental synchronization by optimizing away locking
that can be proven never to contend. If a lock object is accessible only to the current thread,
the JVM is permitted to optimize away a lock acquisition because there is no way another
thread could synchronize on the same lock. For example, the lock acquisition in
Listing 11.2
can always be eliminated by the JVM.
More sophisticated JVMs can use
escapeanalysis
to identify when a local object reference is
variables are automatically thread-local. A naive execution of
getStoogeNames
would
acquire and release the lock on the
Vector
four times, once for each call to
add
or
toString
. However, a smart runtime compiler can inline these calls and then see that