Java Reference
In-Depth Information
The actual cost of context switching varies across platforms, but a good rule of thumb is that
a context switch costs the equivalent of 5,000 to 10,000 clock cycles, or several microseconds
on most current processors.
The vmstat command on Unix systems and the perfmon tool on Windows systems report
the number of context switches and the percentage of time spent in the kernel. High kernel
usage (over 10%) often indicates heavy scheduling activity, which may be caused by block-
ing due to I/O or lock contention.
11.3.2. Memory Synchronization
The performance cost of synchronization comes from several sources. The visibility guar-
antees provided by synchronized and volatile may entail using special instructions
called memory barriers that can flush or invalidate caches, flush hardware write buffers, and
stall execution pipelines. Memory barriers may also have indirect performance consequences
because they inhibit other compiler optimizations; most operations cannot be reordered with
memory barriers.
When assessing the performance impact of synchronization, it is important to distinguish
between contended and uncontended synchronization. The synchronized mechanism is
optimized for the uncontended case ( volatile is always uncontended), and at this writ-
ing, the performance cost of a “fast-path” uncontended synchronization ranges from 20 to
250 clock cycles for most systems. While this is certainly not zero, the effect of needed, un-
contended synchronization is rarely significant in overall application performance, and the
alternative involves compromising safety and potentially signing yourself (or your successor)
up for some very painful bug hunting later.
Modern JVMs can reduce the cost of incidental synchronization by optimizing away locking
that can be proven never to contend. If a lock object is accessible only to the current thread,
the JVM is permitted to optimize away a lock acquisition because there is no way another
thread could synchronize on the same lock. For example, the lock acquisition in Listing 11.2
can always be eliminated by the JVM.
More sophisticated JVMs can use escapeanalysis to identify when a local object reference is
never published to the heap and is therefore thread-local. In getStoogeNames in Listing
11.3 , the only reference to the List is the local variable stooges , and stack-confined
variables are automatically thread-local. A naive execution of getStoogeNames would
acquire and release the lock on the Vector four times, once for each call to add or
toString . However, a smart runtime compiler can inline these calls and then see that
Search WWH ::




Custom Search