Java Reference
In-Depth Information
If you are lucky, the vendor of the target processor for your application will have a tool that
can be used to diagnose false sharing. Intel, for example, has a program called VTune that
can be used to help detect false sharing by inspecting cache miss events. Certain native pro-
filers can provide information about the number of clock cycles per instruction (CPI) for a
given line of code; a high CPI for a simple instruction within a loop can indicate that the
code is waiting to reload the target memory into the CPU cache.
Otherwise, detecting false sharing requires some intuition and experimentation. If an ordin-
ary profile indicates a particular loop is taking a surprising amount of time, inspect the loop
for the possibility that multiple threads may be accessing unshared variables within the loop.
(In the realm of performance tuning as an art rather than a science, even the Intel VTune
manual says that the “primary means of avoiding false sharing is through code inspection.”)
Preventing false sharing requires some code changes. The ideal situation is when the vari-
ables involved can be written less frequently. In the example above, the calculation could
take place using local variables, and only the end result is written back to the DataHolder
variable. The very small number of writes that ensues are unlikely to create contention for
the cache lines, and they won't have a performance impact even if all four threads update
their results at the same time at the end of the loop.
A second possibility involves padding the variables so that they won't be loaded on the same
cache line. If the target CPU has a 128-byte cache, then padding like this may work (but also,
it may not):
public
public class
class DataHolder
DataHolder {
public
public volatile
volatile long
long l1 ;
pubilc long
long [] dummy1 = new
new long
long [ 128 / 8 ];
public
public volatile
volatile long
long l2 ;
pubilc long
long [] dummy2 = new
new long
long [ 128 / 8 ];
public
public volatile
volatile long
long l3 ;
pubilc long
long [] dummy3 = new
new long
long [ 128 / 8 ];
public
public volatile
volatile long
long l4 ;
}
Using arrays like that is unlikely to work, because the JVM will probably rearrange the lay-
out of those instance variables so that all the arrays are next to each other, and then all the
long variables will still be next to each other. Using primitive values to pad the structure is
more likely to work, though it can be impractical because of the number of variables re-
quired.
Search WWH ::




Custom Search