Java Reference
In-Depth Information
12.3.3. Unrealistic Sampling of Code Paths
Runtime compilers use profiling information to help optimize the code being compiled. The
JVM is permitted to use information specific to the execution in order to produce better code,
which means that compiling method M in one program may generate different code than
compiling M in another. In some cases, the JVM may make optimizations based on assump-
tions that may only be true temporarily, and later back them out by invalidating the compiled
code if they become untrue. [8]
As a result, it is important that your test programs not only adequately approximate the usage
patterns of a typical application, but also approximate the set of code paths used by such
an application. Otherwise, a dynamic compiler could make special optimizations to a purely
single-threaded test program that could not be applied in real applications containing at least
occasional parallelism. Therefore, tests of multithreaded performance should normally be
mixed with tests of single-threaded performance, even if you want to measure only sing-
lethreaded performance. (This issue does not arise in TimedPutTakeTest because even
the smallest test case uses two threads.)
12.3.4. Unrealistic Degrees of Contention
Concurrent applications tend to interleave two very different sorts of work: accessing shared
data, such as fetching the next task from a shared work queue, and thread-local computation
(executing the task, assuming the task itself does not access shared data). Depending on the
relative proportions of the two types of work, the application will experience different levels
of contention and exhibit different performance and scaling behaviors.
If N threads are fetching tasks from a shared work queue and executing them, and the tasks
are compute-intensive and long-running (and do not access shared data very much), there will
be almost no contention; throughput is dominated by the availability of CPU resources. On
the other hand, if the tasks are very short-lived, there will be a lot of contention for the work
queue and throughput is dominated by the cost of synchronization.
To obtain realistic results, concurrent performance tests should try to approximate the thread-
local computation done by a typical application in addition to the concurrent coordination
under study. If the the work done for each task in an application is significantly different in
nature or scope from the test program, it is easy to arrive at unwarranted conclusions about
where the performance bottlenecks lie. We saw in Section 11.5 that, for lock-based classes
such as the synchronized Map implementations, whether access to the lock is mostly conten-
ded or mostly uncontended can have a dramatic effect on throughput. The tests in that section
Search WWH ::




Custom Search