base), the thread cannot pick up a new task to execute in the meantime. Hence, there can be
periods where there are tasks to be executed (work to be done), but no thread is available to
execute them; the result is some idle CPU time.
In that example, the size of the thread pool should be increased. However, don't assume that
just because idle CPU is available, it means that the size of the thread pool should be in-
creased in order to accomplish more work. The program may not be getting CPU cycles for
the other two reasons previously mentioned—because of bottlenecks in locks or external re-
sources. It is important to understand why the program isn't getting CPU before determining
a course of action. (See Chapter 9 for more details on this topic.)
Looking at the CPU usage is a first step in understanding application performance, but it is
only that: use it to see if the code is using all the CPU that can be expected, or if it points to
some synchronization or resource issue.
The CPU Run Queue
Both Windows and Unix systems allow you to monitor the number of threads that can be run
(meaning that they are not blocked on I/O, or sleeping, and so on). Unix systems refer to this
as the run queue , and several tools include the run queue length in their output. That includes
the vmstat output in the last section: the first number in each line is the length of the run
queue. Windows refers to this number as the processor queue , and reports it (among other
ways) via typeperf :
C:> typeperf -si 1 "\System\Processor Queue Length"
There is a very important difference in the output here: the run queue length number on a
Unix system (which was either 1 or 2 in the sample vmstat output) is the number of all
threads that are running or that could run if there were an available CPU. In that example,
there was always at least one thread that wanted to run: the single thread doing application
work. Hence, the run queue length was always at least 1. Keep in mind that the run queue
represents everything on the machine, so sometimes there are other threads (from completely
separate processes) that want to run, which is why the run queue length sometimes was 2 in
that sample output.
In Windows, the processor queue length does not include the number of threads that are cur-
rently running. Hence, in the typeperf sample output, the processor queue number was 0,