Java Reference
In-Depth Information
do nothing but pound on the Map ; even with two threads, all attempts to access the Map are
contended. However, if an application did a significant amount of thread-local computation
for each time it accesses the shared data structure, the contention level might be low enough
to offer good performance.
In this regard, TimedPutTakeTest may be a poor model for some applications. Since the
worker threads do not do very much, throughput is dominated by coordination overhead, and
this is not necessarily the case in all applications that exchange data between producers and
consumers via bounded buffers.
12.3.5. Dead Code Elimination
One of the challenges of writing good benchmarks (in any language) is that optimizing com-
pilers are adept at spotting and eliminating dead code—code that has no effect on the out-
come. Since benchmarks often don't compute anything, they are an easy target for the optim-
izer. Most of the time, it is a good thing when the optimizer prunes dead code from a program,
but for a benchmark this is a big problem because then you are measuring less execution than
you think. If you're lucky, the optimizer will prune away your entire program, and then it will
be obvious that your data is bogus. If you're unlucky, dead-code elimination will just speed
up your program by some factor that could be explained by other means.
Dead-code elimination is a problem in benchmarking statically compiled languages too, but
detecting that the compiler has eliminated a good chunk of your benchmark is a lot easier be-
cause you can look at the machine code and see that a part of your program is missing. With
dynamically compiled languages, that information is not easily accessible.
Many microbenchmarks perform much “better” when run with HotSpot's -server com-
piler than with -client , not just because the server compiler can produce more efficient
code, but also because it is more adept at optimizing dead code. Unfortunately, the dead-code
elimination that made such short work of your benchmark won't do quite as well with code
that actually does something. But you should still prefer -server to -client for both
production and testing on multiprocessor systems—you just have to write your tests so that
they are not susceptible to dead-code elimination.
Writing effective performance tests requires tricking the optimizer into not optimizing away
your benchmark as dead code. This requires every computed result to be used somehow by
your program—in a way that does not require synchronization or substantial computation.
Search WWH ::




Custom Search