Because the outer loop (indexed by j ) is based on the position of the element in the array, the
calculation requires a length of time proportional to the element position: calculating the
value for d will take a very long time, while calculating the value for d[d.length - 1]
will take relatively little time.
Now the simple partitioning of the ThreadPoolExecutor test will be at a disadvantage. The
thread calculating the first partition of the array will take a very long time to complete: much
longer than the time spent by the fourth thread operating on the last partition. Once that
fourth thread is finished, it will remain idle: everything must wait for the first thread to com-
plete its long task.
The granularity of the 2 million tasks in the ForkJoinPool means that although one thread
will get stuck doing the very long calculations on the first 10 elements in the array, the re-
maining threads will still have work to perform, and the CPU will be kept busy during most
of the test. That difference is shown in Table 9-5 .
Table 9-5. Time to process an array of 10,000 elements
Number of threads ForkJoinPool ThreadPoolExecutor
When there is a single thread in the pool, the computation takes essentially the same amount
of time. That makes sense: the number of calculations is the same regardless of the pool im-
plementation, and since those calculations are never done in parallel, they can be expected to
take the same amount of time (though there is still some small overhead for creating the 2
million tasks). But when the pool contains four threads, the granularity of the tasks in the
ForkJoinPool gives it a decided advantage: it is able to keep the CPUs busy for almost the
entire duration of the test.
This situation is called “unbalanced,” because some tasks take longer than others (and hence
the tasks in the previous example are called balanced). In general, this leads to the recom-
mendation that using a ThreadPoolExecutor with partitioning will give better performance
when the tasks are balanced, and a ForkJoinPool will give better performance when the
tasks are unbalanced.