When CPU was limited, however, CMS performed much worse: about 23.5% fewer TPS.
Note too that CMS could not keep the CPU close to 100% busy in this experiment. That's
because there were not sufficient CPU cycles for the background CMS threads, and so CMS
encountered concurrent mode failures. Those failures meant the JVM had to perform a
single-threaded full GC, and those periods of time (during which the four-CPU machine was
only 25% busy) drove down the average CPU utilization.
GC algorithms and response time tests
Table 5-3 shows the same test with a think time of 250 ms between requests, which results in
a fixed load of 29 TPS. The performance measurement then is the average, 90th%, and
99th% response times for each request.
Table 5-3. Response time with different GC algorithms
Session size Throughput
0.092 0.171 0.813 41% 0.104 0.211 0.260 46%
0.180 0.218 3.617 55% 0.107 0.222 0.315 53%
The first test saves the previous 10 requests in the user's session state. The result here is typ-
ical when comparing the two collectors: the throughput collector is faster than the concurrent
collector in terms of an average response time and even the 90th% response time. The 99th%
response time shows a significant advantage for CMS: the full GCs in the throughput case
made 1% of the operations (those that were stopped during a full GC) take significantly
longer. CMS uses about 10% more CPU to get that improvement in the 99th% result.
When 50 items were saved in the session data, GC cycles had a much bigger impact, particu-
larly in the throughput case. Now the average response time for the throughput collector is
much higher than CMS, and all because of the very large outliers that drove the 99th% re-
sponse time over 3 seconds. Interestingly, the 90th% response time for the throughput col-
lector is lower than for CMS—when the JVM isn't doing those full GCs, the throughput col-
lector still shows an advantage.
Cases like that certainly occur from time to time, but they are far less common than the first
case. In a sense, CMS was lucky in the last case too: often when the heap contains so much
live data that the full GC time dominates the response times for the throughput collector, the