Dealing with Many Open Sockets - The Lessons of NFS - Multithreaded Programming with JAVA

Thread creation and synchronization time is quite low (about 1.5 ms on an 110-MHz SS4),

making it reasonable to dispatch relatively small tasks to different threads. How small can that

task be? Obviously, it must be significantly larger than the thread overhead.

Something like a 10 x 10 matrix multiply (requiring about 2000 FP ops @ 100 Mflops = 20 µs)

would be much too small to thread. By contrast, a 100 x 100 matrix multiply (2M FP ops @ 100

Mflops = 20 ms) can be threaded very effectively. If you were writing a matrix routine, your code

would check the size of the matrices and run the threaded code for larger multiplies, and run the

simple multiply in the calling thread for smaller multiplies. The exact dividing point will be about

3 ms. You can determine this empirically, and it is not terribly important to hit exactly.

One ISV we worked with was doing an EDA simulation, containing millions of 10-µs tasks. To

say the least, threading this code did not produce favorable results (it ran much slower!). They

later figured out a way of grouping the microtasks into larger tasks and threading those. The

opposite case is something like NFS, which contains hundreds of 40-ms tasks. Threading NFS

works quite well.

Dealing with Many Open Sockets

In C, C++, etc., when you want to have a large number of clients connected to your server at the

same time, you use a select()[8] call [in Win32 it's waitForMultipleObjects()]. This

function takes a list of file descriptors as an argument and returns when there is data ready on one

of them. This allows a single thread to wait on 1000 sockets. This is a good thing because the

overhead of having 1000 threads, each waiting on a single socket (as we've done in our programs),

would be prohibitive.

[8]

Or poll(), which is actually more common now, due to its ability to handle very large numbers

of open connections.

Unfortunately, Java does not have anything similar, putting an extra constraint on the size and

scalability of your server. In Java you must have one thread devoted to each client, rendering the

producer/consumer version of a server awkward. Many of the major Java server programs actually

use JNI calls into C to make use of the select() there. There is pressure for Java to implement

select().

The Lessons of NFS

One practical problem in evaluating the performance of threaded programs is the lack of available

data. There are simply no good analyses of real threaded programs that we can look at. (There are

analyses of strictly computational parallel programs but not of mixed usage programs, client/

server, etc.) Nobody's done it yet! Probably the best data we have comes from NFS, which we

shall look at now.

The standard metric for evaluating NFS performance is the SPEC LADDIS benchmark, which

uses a predefined mix of file operations intended to reflect realistic usage (lots of small file

information requests, some file reads, and a few file writes). As the NFS performance goes up,

LADDIS spreads the file operations over a larger number of files on more disks to eliminate trivial,

single-disk bottlenecks.

An NFS server is very demanding on all subsystems, and as the hardware in one area improves,

NFS performance will edge up until it hits a bottleneck in another. Figure 15-8 shows

configurations and performance results for a variety of systems. Notably, all of these systems are

Search WWH :

Custom Search