it waits. Threaded programs can finesse this, because there is no problem with thread 4 continuing
to run while thread 1 is waiting for a page fault. The finesse yields a nice performance
improvement for many programs, even on uniprocessor machines.
Figure 15-7. Using Threads to Optimize Paging
Sometimes the amount of data that needs to be exchanged between threads for a program is very
low compared to the total computing time. For example, a chess position can be encoded into a
dozen or so bytes, whereas the time to compute the best move might be hours. Such a problem,
which also requires only a tiny amount of synchronization, can be productively spread across
thousands of very distant processors that don't even share memory. Distributed parallel systems
such as PVM are well suited to such problems.
In one of the big computer chess tournaments back in the late 1980s, one of the contestants
managed to convince several thousand of us to run a networked chess program over the weekend.
When the data/computation ratio is higher, or when more synchronization is required, distributing
across a network is not feasible, as the communications costs would exceed the CPU time to
execute the entire computation locally. Most image processing programs fit into this category.
Dithering a 1000 x 1000 image might take 1 second on one CPU and require very little
synchronization. Executing this program on 1000 CPUs would take only 1 ms of computation
time, yet moving that 1-meg image out and back across a network would take far longer.
Executing it on a 10-CPU shared memory multiprocessor would make far more sense, taking more
like 100 ms total.
Right Number of Threads
You want to have enough threads to keep all the CPUs busy all the time (if possible), but not so
many that the CPUs are doing unnecessary context switching. Determining exactly the right
number is ultimately an empirical experiment. We give rough estimates in How Many LWPs?.
Search WWH :