Java Reference
In-Depth Information
Table 9-4. Time to count an array of 100 million elements
Number of threads ForkJoinPool ThreadPoolExecutor
3.2 seconds
0.31 seconds
1.9 seconds
0.15 seconds
This test is from a four-CPU machine with a 4 GB fixed-size heap. The test using a
ThreadPoolExecutor required no GC at all, but each ForkJoinPool test spent about 1.2
seconds in GC. That is a significant contribution to the performance difference, but it isn't
the entire story: the overhead of creating and managing the task objects hampers the per-
formance of the ForkJoinPool . When a similar alternative is available, it is likely to be
faster—at least in this simple case.
An additional feature of the ForkJoinPool is that it implements work-stealing. That's basic-
ally an implementation detail; it means that each thread in the pool has its own queue of
tasks it has forked. Threads will preferentially work on tasks from their own queue, but if
that queue is empty they will steal tasks from the queues of other threads. The upshot is that
even if one of the 2 million tasks takes a long time to execute, other threads in the
ForkJoinPool can complete any and all of the remaining tasks. The same is not true of the
ThreadPoolExecutor : if one of its tasks requires a long time, the other threads cannot pick
up additional work.
The example code started by simply counting elements in the array that are less than 0.5.
What if, in addition, the code calculated a new value for the double array to store? A non-
sensical (but CPU-intensive) implementation could execute this code:
for ( int
int i = first ; i <= last ; i ++) {
iif ( d [ i ] < 0.5 ) {
subCount ++;
for ( int
int j = 0 ; j < d . length - i ; j ++) {
for ( int
int k = 0 ; k < 100 ; k ++) {
dummy = j * k + i ; // dummy is volatile, so multiple writes occur
d [ i ] = dummy ;
Search WWH ::

Custom Search