Summary - Multithreaded Programming with JAVA

Actual data transfers are accomplished via DMA from/to disks and the network. The data is

brought into the CPU only to perform checksums; it is never written by the CPU. Checksums have

horrible data locality--they load lots of data, but use that data only once, and only for a single

addition. This means that the CPU will spend an inordinate amount of time stalled, waiting for

cache loads, but that it will do virtually no writes. (Some folks are building checksumming

hardware for exactly this purpose.) Normal programs spend more time using the data once loaded

into cache, do more writes, and generally spend less time stalled on cache misses.

NFS is constructed as a producer/consumer program. The master/slave design was rejected as

being inappropriate because of the nature of interrupt handling. When a network card gets a packet,

it issues an interrupt to one of the CPUs (interrupts are distributed in a round-robin fashion on

Sun's UE series). That CPU then runs its interrupt handler thread.

For an NFS request, the interrupt handler thread acts as the producer, building an NFS request

structure and putting that onto a list. It is important for the interrupt handler thread to complete

very quickly (as other interrupts will be blocked while it's running); thus it is not possible for that

thread to do any appreciable amount of work (such as processing the request or creating a new

thread). The consumers pull requests off the queue (exactly like our P/C example) and process

them as appropriate. Sometimes the required information will be in memory, but usually a disk

request will be required. This means that most requests will require a context switch.

Many of the original algorithms used in single-threaded NFS proved to be inappropriate for a

threaded program. They worked correctly, but suffered from excessive contention when

appropriate locking was added. A major amount of the work on multithreaded NFS was spent on

writing new algorithms that would be less contentious.

The results? An implementation that scales extremely well on upward of 24 CPUs.

Summary

Performance tuning is a very complex issue that has numerous trade-offs to be considered. Once a

performance objective and level of effort has been established, you can start looking at the

computer science issues. Even then the major issues will not be threading issues. Only after you've

done a great deal of normal optimization work will you turn your eyes toward threads. We give a

cursory overview of the areas you need to consider, and wish you the best of luck.

Search WWH :

Custom Search