The Lost Wakeup - Example 7-10 The Lost Wakeup Problem - Multithreaded Programming with JAVA

It is somewhat reasonable to consider recovering from a deadlock in the case of a process dying

unexpectedly. In other deadlock situations, where threads are waiting for each other, you really

shouldn't be looking at recovery techniques. You should be looking at your coding techniques.

System V-shared semaphores do make provision for recovery, and they may prove to be the

solution to your problem. They provide room for a system-maintained "undo" structure, which

will be invoked should the owner process die, and they can be reset by any process with

permission. They are expensive to use, though, and add complexity to your code.

Both Win32 and UI robust mutexes have built-in "death detection" also, so that your program can

find out that the mutex it was waiting for was held by a newly dead thread.

Still, just having to undo structures that can reset mutexes does not solve the real problem. The

data protected may be inconsistent, and this is what you have to deal with. It is possible to build

arbitrarily complex undo structures for your code, but it is a significant task that should not be

undertaken lightly.

Database systems do this routinely via two-phase commit strategies, as they have severe

restrictions on crash recovery. Essentially, what they do is (1) build a time-stamped structure

containing what the database will look like at the completion of the change; (2) save that structure

to disk and begin the change; (3) complete the change; (4) update the time stamp on the database;

and (5) delete the structure. A crash at any point in this sequence of events can be recovered from

reliably.

Java does not have anything similar to these recoverable mutexes, nor does it need them. Java

programs are either single process programs (in which case a deadlock is a programming bug) or

they use RMI or some other kind of remote method invocation (in which case the RMI package is

responsible for dealing with dead processes).

Be very, very careful when dealing with this problem!

The Lost Wakeup

If you simply neglect to hold the lock while testing or changing the value of the condition, your

program will be subject to the fearsome lost wakeup problem. This condition occurs when one of

your threads misses a wakeup signal because it had not yet gone to sleep. Of course, if you're not

protecting your shared data correctly, your program will be subject to numerous other bugs, so this

is nothing special. In Java it is not possible to suffer the lost wakeup problem just using

notify()/wait() directly because you must hold the lock before you can call notify().

However, you can create constructs in Java that will have this problem. The Mutex and

ConditionVar classes that we just built are subject to lost wakeup.

In Code Example 7-10 (slightly modified from our StopQueue example), it is possible for the

stopper (which has failed to use the lock) to decide that it's time to stop and broadcast right at the

instant between when the consumer checks the condition and when it goes to sleep. This code will

promptly hang.

The probability that the stopper would get to run at exactly the right (er, wrong) instant is very

small. (In 1000 test runs of this code it did not occur once.) If we insert a slight delay in the

consumer between the test and the call to condBroadcast(), we can get it to happen. (In the

example code on the Web, the program LostWakeup allows you to vary the sleep time (delay)

to see how often it occurs on your machine.)

Example 7-10 The Lost Wakeup Problem

Search WWH :

Custom Search