Chapter 1. Introduction |
|
|
Figure 1-1. Performance for Digital's Alpha Servers (8400 5/625) |
Chapter 2. Concepts |
Background: Traditional Operating Systems |
|
Figure 2-1. Memory Layout for DOS-Style Operating Systems |
Figure 2-2. Memory Layout for Multitasking Systems |
Figure 2-3. Processes on a Multitasking System |
What Is a Thread? |
|
Figure 2-4. Relationship between a Process and Threads |
Figure 2-5. Process Structure and Thread Structures |
Kernel Interaction |
Concurrency vs. Parallelism |
Figure 2-6. Three Threads Running Concurrently on One CPU |
Figure 2-7. Three Threads Running in Parallel on Three CPUs |
System Calls |
Signals |
Synchronization |
Scheduling |
The Value of Using Threads |
Parallelism |
Figure 2-8. Different Threads Running on Different Processors |
Throughput |
Figure 2-9. Two Threads Making Overlapping System Calls |
Responsiveness |
Figure 2-10. Threads Overlapping Calculation and I/O |
Communications |
Figure 2-11. Different Clients Being Handled by Different Threads |
System Resources |
Distributed Objects |
Figure 2-12. Distributed Objects Running on Distinct Threads |
Same Binary for Uniprocessors and Multiprocessors |
Program Structure |
Figure 2-13. Simplified Flow of Control in Complex Applications |
What Kinds of Programs to Thread |
Inherently MT Programs |
Independent tasks |
Servers |
Repetitive tasks |
Not Obviously MT Programs |
Numerical programs |
Old code |
Automatic Threading |
Programs Not to Thread |
What About Shared Memory? |
Threads Standards |
POSIX Threads |
Win32 and OS/2 Threads |
DCE Threads |
Solaris Threads |
Performance |
Operating Systems |
NFS |
Figure 2-14. NFS Performance on MP Machines (SPEC '96) |
SPECfp 95 |
Table 2-1. SPECfp95 Results for Alpha 4100 5/466 (SPEC '97) |
SPECint_rate95 |
Figure 2-15. Running SPECrate_fp95 on an SGI Origin/200, 2000 (SPEC '96) |
Java Benchmarks |
Summary |
Chapter 3. Foundations |
Implementation vs. Specification |
Thread Libraries |
The Process Structure |
|
Figure 3-1. Process Structure in Traditional UNIX and in Solaris 2 |
Lightweight Processes |
|
Figure 3-6. Operation of a System Call |
Digital UNIX |
Linux |
Threads and LWPs |
|
Figure 3-2. Contents of a Thread |
Figure 3-3. How the Threads Library Fits into a Process |
Figure 3-4. How Java Is Built on Lower-Level Threads Libraries |
The POSIX Multithreaded Model |
|
Figure 3-5. POSIX Multithreaded Architecture |
System Calls |
Signals |
Summary |
Chapter 4. Lifecycle |
Thread Lifecycle |
|
Example 4-1 Simple Call to Create and Exit a POSIX Thread |
Example 4-2 Simple Call to Create and Exit a Java Thread |
Example 4-3 Simple Call to Create and Exit a Win32 Thread |
Exiting a Thread |
The Runnable Interface |
Example 4-4 Simple Call to Run a Runnable in a Thread |
Figure 4-1. Creating a Thread via a Runnable |
Example 4-5 Defining run() via an Inner Class in a Thread |
Waiting for Threads |
Example 4-6 Waiting for Threads to Exit |
Figure 4-2. Using thread.join() |
Who Am I? |
Example 4-7 Getting the Current Thread's Identity |
Don't Wait for Threads, Don't Return Status |
Exiting the Process |
Suspending a Thread |
Cancellation |
Example 4-8 Cancellation in the Three Libraries |
Figure 4-3. Cancellation |
ThreadDeath |
Garbage Collecting Threads |
Zombies |
Figure 4-4. Java Thread Create and Join |
Is She Still Alive? |
Restarting Threads |
An Example: Create and Join |
The Main Thread Is Different |
Example 4-9 Java Create and Join |
Example 4-10 Output for Code Example 4-9 |
Example 4-11 Construct and Start in a Single Line |
APIs Used in This Chapter |
The Class java.lang.Thread |
The Class Extensions.InterruptibleThread |
The Interface java.lang.Runnable |
Summary |
Chapter 5. Scheduling |
Different Models of Kernel Scheduling |
Many Threads on One LWP |
One Thread per LWP |
Many Threads on Many LWPs (Strict) |
The Two-Level Model |
Win32 Fibers |
Thread Scheduling |
|
Figure 5-1. The Two Basic Types of Scheduling |
Process Contention Scope |
Priority Levels |
Scheduling States |
Active: |
Runnable: |
Sleeping: |
Suspended: |
Zombie: |
Figure 5-2. Some Process Contention Scope Threads in Various States |
Figure 5-3. Simplified View of Thread State Transitions |
System Contention Scope |
Figure 5-4. Some System Contention Scope Threads in Various States |
Context Switching |
|
Figure 5-5. How a Context Switch Works |
Figure 5-6. How a Context Switch Works |
Preemption |
How Many LWPs? |
How to Get Those LWPs in Java |
Changing Scheduling Parameters for LWPs |
nice() |
Realtime LWPs |
Avoid Realtime |
Allocation Domains |
Binding LWPs to Processors |
Happiness Is a Warm Cache |
Figure 5-7. Processor Affinity |
Java Scheduling Summary |
How Many Threads in Java? |
When Should You Care About Scheduling? |
APIs Used in This Chapter |
The Class java.lang.Thread |
Summary |
Chapter 6. Synchronization |
Synchronization Issues |
|
Example 6-1 Why Synchronization Is Necessary |
Atomic Actions and Atomic Instructions |
Example 6-2 Pseudo-assembly Code for the Mutual Exclusion Lock |
Critical Sections |
Lock Your Shared Data! |
Synchronization Variables |
Mutexes |
Example 6-3 Using Mutexes in the Various Libraries |
Figure 6-1. Mutex with Several Threads Sleeping on It |
Figure 6-2. Execution Graph of the Operation of a Mutex |
Figure 6-3. Protecting a Shared List with a Mutex |
Example 6-4 Protecting a Shared List with a Mutex (POSIX) |
Figure 6-4. All Objects Have Their Own Mutex and Wait Set |
Example 6-5 Synchronized Statement |
Example 6-6 Using synchronized in Java |
Figure 6-5. Each Instance Has Its Own Mutex |
Example 6-7 Static Synchronized Methods Also Use the Class Lock |
Example 6-8 You May Use an Unrelated Object to Protect Static Data |
Example 6-9 You May Use the Class Itself to Protect Static Data |
Example 6-10 Protecting a Shared List with a Mutex (Java) |
Semaphores |
Example 6-11 Basic Use of Counting Semaphores |
Figure 6-6. How a Semaphore Operates |
Figure 6-7. Execution Graph of the Operation of a Semaphore |
Example 6-12 Classic Producer/Consumer Example (one_queue_problem.c) |
Example 6-13 Classic Producer/Consumer Example (OneQueueProblem) |
Using Barriers to Count Exiting Threads |
A Different View of Semaphores |
Figure 6-8. Flowchart for Semaphores |
Condition Variables |
Figure 6-9. Flowchart for Condition Variables |
Figure 6-10. Threads Using a Condition Variable |
Example 6-14 Using a Condition Variable (POSIX) |
Java wait/notify |
Example 6-15 Using wait/notify (Java) |
Extraneous Contention |
Figure 6-11. Extra Contention: When the Mutex Is Held by a Different Thread |
InterruptedException |
Example 6-16 Using wait/notify with InterruptedException |
Controlling the Queue Length |
Example 6-17 Classic Producer/Consumer Model (with a Tiny Bug) |
POSIX-Style Synchronization in Java |
Example 6-18 Classic Producer/Consumer in POSIX |
POSIX-Style Mutexes in Java |
Example 6-19 Implementing POSIX-Style Mutexes in Java |
POSIX-Style Condition Variables in Java |
Example 6-20 Implementing Condition Variables in Java |
Example 6-21 condWait() Done Wrong |
Example 6-22 Producer/Consumer Model Using POSIX-Style Synchronization |
A Stoppable Producer/Consumer Example |
Example 6-23 Stoppable Producer/Consumer Model |
Example 6-24 Stoppable Producer/Consumer Model (Stopper) |
Example 6-25 Stoppable Producer/Consumer Model (Starting Up and Shutting Down in main() |
APIs Used in This Chapter |
The Class java.lang.Object |
The Class Extensions.Semaphore |
The Class Extensions.Mutex |
The Class Extensions.ConditionVar |
Summary |
Chapter 7. Complexities |
Complex Locking Primitives |
Readers/Writer Locks |
Figure 7-1. How Readers/Writer Locks Work |
Figure 7-2. Execution Graph for Readers/Writer Locks |
Example 7-1 Readers/Writer Locks in Java |
Priority Inheritance Mutexes |
Figure 7-3. Priority Inversion |
FIFO Mutexes |
Figure 7-4. When FIFO Mutexes Are Valuable |
Recursive Mutexes |
Example 7-2 Recursive Locking Calls in POSIX and Java |
Nonblocking Synchronization |
Spin Locks |
Example 7-3 Simple Spin Lock |
Adaptive Spin Locks |
Java May Use Spin Locks |
Timeouts |
Elvis and the UFOs |
Example 7-4 Recalculating Timeouts |
Other Synchronization Variables |
Join |
Barriers |
Figure 7-5. Barriers |
Single Barriers |
Figure 7-6. Single Barriers |
Example 7-5 Implementing Single Barriers in Java |
Win32 Event Objects |
Win32 Critical Sections |
Multiple Wait Semaphores |
Interlocked Instructions |
Message Queues |
Win32 I/O Completion Ports |
Communicating via Streams |
Volatile |
Performance |
Condition Variables vs. wait/notify |
Coarse vs. Fine Grain Locking |
What to Lock |
Figure 7-7. Friends/Enemies with Only One Local Mutex Lock |
Double-Checked Locking |
Example 7-6 Double-Checked Locking |
Synchronization Problems |
Deadlocks |
Example 7-7 Deadlock in Java |
Figure 7-8. Typical Deadlock |
Example 7-8 Locking Mutexes Out of Order |
Race Conditions |
Example 7-9 Simplistic Race Condition |
Recovering from Deadlocks |
The Lost Wakeup |
Example 7-10 The Lost Wakeup Problem |
InterruptedException |
Example 7-11 Ignoring InterruptedException |
Example 7-12 Propagating InterruptedException |
Example 7-13 Ignoring InterruptedException |
APIs Used in This Chapter |
The Class Extensions.RWLock |
The Class Extensions.Barrier |
The Class Extensions.SingleBarrier |
Summary |
Chapter 8. TSD |
Thread-Specific Data |
|
Figure . Thread-Specific Data in POSIX |
Example 8-1 Use of POSIX TSD |
Example 8-2 Dynamic TLS in Win32 |
Java TSD |
|
Example 8-3 Implementing TSD by Subclassing Thread |
Example 8-4 Using ThreadLocal in Java |
Example 8-5 Recording the Runnable in the Thread |
Example 8-6 Recording the Runnable in Thread Local Storage |
APIs Used in This Chapter |
The Class java.lang.ThreadLocal |
Summary |
Chapter 9. Cancellation |
What Cancellation Is |
|
Figure 9-1. Thread Cancellation |
Example 9-1 Asynchronous Thread Cancellation |
Polling for Cancellation |
Asynchronous Cancellation |
Deferred Cancellation |
Using interrupt() for Deferred Cancellation |
Progressive Shutdown |
interrupt() |
|
Example 9-2 Using thread.interrupt() |
Don't Call stop() |
ThreadDeath |
Using stop() to Implement Thread.exit() |
Example 9-3 Implementing exit() |
Never Exit a Thread! |
Don't Call destroy() |
Example 9-4 From ThreadedSwing Example: NumericButtonListener.java |
Defined Cancellation/Interruption Points |
The Problem with I/O |
Not Cancelling upon Interruption |
Handling Interrupts |
Disabling Interrupts |
Example 9-5 Testing a Variable from an Exception Handler |
Ignore Interrupts |
Example 9-6 Inventing an InterruptDisabled Protocol |
Exit on interrupt() |
Propagate InterruptedException |
Reinterrupt |
Example 9-7 Calling interrupt() upon Return |
Example 9-8 Naive Condition Variable and Readers/Writer Lock |
Example 9-9 Handling Interruptions from condWait() the Hard Way |
Example 9-10 The Right Way of Implementing condWait() |
Cancellation State |
A Cancellation Example |
|
Figure 9-2. Cancellation Example Program |
Example 9-11 Using interrupt() to Cancel Searcher Threads |
Using Cancellation |
Ensuring Bounded CPU Time |
Example 9-12 Deferred Cancellation as Polling |
Interrupting Computational Loops |
Example 9-13 Testing Once Every 1000 Iterations |
Interrupting Blocked Threads |
Interrupting Sockets |
What Should Throw InterruptedException? |
Interrupting Sleeping Threads |
Interruption in wait() |
Example 9-14 Interrupting a wait() |
The Morning After |
Example 9-15 From the main() Method for Searcher Example |
Cleanup |
|
Example 9-16 How POSIX Cleanup Handlers Are Used |
Example 9-17 How InterruptedException Handlers Clean Up |
Implementing enableInterrupts() |
|
Example 9-18 Implementing enableInterrupts() |
A Cancellation Example (Improved) |
|
Example 9-19 Cleaner Version of doDatabaseThing() |
Simple Polling |
|
Example 9-20 1Implementing the Searcher with Polling |
APIs Used in This Chapter |
The Class java.lang.Thread |
The Class Extensions.InterruptibleThread |
Summary |
Chapter 10. Details |
Thread Groups |
Thread Security |
|
Example 10-1 Checking for Security Violations |
Example 10-2 A Simple Security Exception |
Table 10-1. Thread Class Methods That May Cause a Security Check |
Table 10-2. ThreadGroup Class Methods That May Cause a Security Check |
Real-World Examples |
1. The garbage collector thread and the finalize() method |
2. Performance: Access to synchronized objects |
3. More problems with the garbage collection thread |
4. Make synchronized code sections as small as possible |
Example 10-3 Synchronizing Part of a Method> |
5. Threaded class downloads |
General Tips and Hints |
Daemon Threads |
Daemon Thread Groups |
Calling Native Code |
|
Example 10-4 Locking Monitors from C Code |
Figure 10-1. Java Thread Objects Use Native Threads |
A Few Assorted Methods |
Stack Size |
Deprecated Methods |
The Effect of Using a JIT |
Adaptive Compilers |
APIs Used in This Chapter |
The Class java.lang.Thread |
The Class java.lang.ThreadGroup |
Summary |
Chapter 11. Libraries |
The Native Threads Libraries |
Multithreaded Kernels |
|
Figure 11-1. Concurrency within the Kernel |
Symmetric Multiprocessing |
Are Libraries Safe? |
|
Figure 11-2. Using pread() and pwrite() to Keep Track of the File Pointer |
Window Systems |
Figure 11-3. Threads Using invokeLater() with the Swing Toolkit |
Figure 11-4. ThreadedSwing Window Example |
Example 11-1 Using Threads in Swing |
Working with Unsafe Libraries |
Example 11-2 Protecting a HashMap |
Example 11-3 Subclassing an Unsafe Object |
When Should a Class Be Synchronized? |
Synchronized Collections in Java 2 |
Example 11-4 Making Synchronized Collections |
Example 11-5 Protecting an Iterator |
Java's Multithreaded Garbage Collector |
Locks during Finalization |
Summary |
Chapter 12. Design |
Making Libraries Safe and Hot |
Trivial Library Functions |
Example 12-1 Simple MT-Safe Implementation of rand(), Version 1 |
Functions That Maintain State across Invocations |
Example 12-2 Implementing rand() with TSD, Version 2 |
Making malloc() More Concurrent |
Figure 12-1. Current Solaris Implementation of malloc() |
Using Thread-Specific Data to Make malloc() More Concurrent |
Figure 12-2. Threads with Individual TSD malloc() areas |
Using Other Methods to Make malloc() More Concurrent |
Figure 12-3. Threads Using an Array of malloc() Areas. |
Manipulating Lists |
Figure 12-4. Friends/Enemies: Basic Design |
Basic Design |
Single, Global Mutex |
Figure 12-5. Friends/Enemies: Global Mutex Lock |
Example 12-3 Giving Friends Raises (from FriendThread.java) |
Global RWLock with Global Mutex to Protect Salaries |
Figure 12-6. Friends/Enemies: Global RWlock and Salary Lock |
Example 12-4 : giveRaise() (listGlobaRW.java) |
Example 12-5 Removing an Element from the List (ListGlobalRW2.java) |
Global RWLock with Local Mutex to Protect Salaries |
Figure 12-7. Friends/Enemies: Global RWlock and Local Salary Lock |
One Local Lock |
Figure 12-8. Friends/Enemies with Only One Local Mutex Lock |
Example 12-6 Searching Code (ListLocalLock.java) |
Two Local Locks |
Figure 12-9. Friends/Enemies: Two Local Locks |
Local RWLock with Local Mutex to Protect Salaries |
Figure 12-10. Friends/Enemies: Local Lock and RWlock |
Program Design |
Master/Slave |
Client/Server (Thread per Request) |
Example 12-7 Master/Slave Socket Design |
Producer/Consumer |
Example 12-8 Producer/Consumer Socket Design |
Pipeline |
Example 12-9 Pipeline Design |
Client/Server (Thread per Client) |
Example 12-10 Thread per Client Design |
Design Patterns |
Summary |
Chapter 13. RMI |
Remote Method Invocation |
|
Figure 13-1. A Simple RMI Call Sending and Receiving a String |
Sending Remote References |
Figure 13-2. A More Complex RMI Call Sending a Remote Object Reference |
Example 13-1 Simple RMI Server and Client |
Example 13-2 Running ServerRMI |
RMI's Use of Threads |
The Deadlock Problem with RMI |
Figure 13-3. Deadlock by Remote Callback |
Remote Garbage Collection |
Summary |
Chapter 14. Tools |
Static Lock Analyzer |
Using a Thread-Aware, Graphical Debugger |
|
Figure 14-1. Sun's Debugger [Program Stopped in sleep()] |
Proctool |
|
Figure 14-2. Proctool, Main Display Window |
Figure 14-3. Proctool, LWP Display Window |
TNFview |
|
Figure 14-4. Data Collection for TNF |
Figure 14-5. Main Data Display Window for TNF |
Figure 14-6. Histogram Display Window for TNF |
Example 14-1 Code Using TNF Probes in Java |
Summary |
Chapter 15. Performance |
Optimization: Objectives and Objections |
Time to Market |
Available Human Resources and Programming Costs |
Portability |
User Perception |
Competition |
Targeted Machine Configuration |
Algorithm |
CPU Time, I/O Time, Contention, Etc. |
CPU |
Memory Latency |
Memory Bandwidth |
I/O Latency |
Contention |
Throughput vs. Latency |
Figure 15-1. NFS Throughput vs. Latency on Some SGI Machines |
Limits on Speedup |
|
Figure 15-2. Parallel Speedup on Several Numerical Programs |
Figure 15-3. Program Behavior for Parallelized Benchmarks |
Superlinear Speedup |
Timing Threaded and Nonthreaded Programs |
Amdahl's Law |
|
Figure 15-4. Amdahl's Law: Time(total) = Time(serial) + Time(parallel) / Number_of_CPUs |
Figure 15-5. TPC-C Performance of a Sun UE6000 |
Performance Bottlenecks |
|
Figure 15-6. Performance Bottlenecks and Capacities of Programs |
Benchmarks and Repeatable Testing |
Is It Really Faster? |
Table 15-1. Runtimes for Four Trials |
Table 15-2. Runtimes for Ten Trials |
General Performance Optimizations |
Best Algorithm |
Compiler Optimization |
C Compiler Optimization |
Java Compiler Optimization |
Buy Enough RAM |
Minimize I/O |
Minimize Cache Misses |
Any Other Loop Optimizations |
Thread-Specific Performance Optimizations |
Reducing Contention |
Minimizing MT Overhead |
Reducing Paging |
Figure 15-7. Using Threads to Optimize Paging |
Communications Bandwidth |
Right Number of Threads |
Short-Lived Threads |
Dealing with Many Open Sockets |
The Lessons of NFS |
|
Figure 15-8. NFS Throughput on a Series of Sun UE Machines (The performance improvement is somewhat exaggerated, as a two-way UE6000 will outperform a two-way UE 2.) |
Summary |
Chapter 16. Hardware |
Types of Multiprocessors |
Shared Memory Symmetric Multiprocessors |
The CPU |
The System |
Figure 16-1. SMP System Architecture |
Store Barriers |
Bus Architectures |
Direct-Switched Buses |
Figure 16-2. Direct-Switched Memory Bus |
Packet-Switched Buses |
Figure 16-3. Packet-Switched Memory Bus |
Crossbar Switches |
Figure 16-4. Cluster Using a Crossbar Switch |
Hierarchical Interconnects |
Figure 16-5. Hierarchical Design of the SGI Origin Series |
ccNUMA |
Packet-Switched Buses and ldstub |
Figure 16-6. Packet-Switched Memory Bus Running ldstub |
Example 16-1 Spin Locks Done Better |
The Thundering Herds |
LoadLocked/StoreConditional and Compare and Swap |
Example 16-2 Atomic Increment Using LoadLocked and StoreConditional |
Figure 16-7. SMP System Architecture |
Lock-Free Semaphores and Reference Counting |
Volatile: The Rest of the Story |
Atomic Reads and Writes |
Interlocked Instructions |
Memory Systems |
Reducing Cache Misses |
Table 16-1. Selected SPEC Benchmarks for Two UE 3500s |
Cache Blocking |
Data Reorganization |
Word Tearing |
False Sharing |
Example 16-3 False Sharing |
Summary |
Chapter 17. Examples |
Threads and Windows |
|
Example 17-1 ThreadedSwing Program |
Figure 17-1. ThreadedSwing Window Example |
Displaying Things for a Moment (Memory.java) |
|
Figure 17-2. The Memory Game |
Example 17-2 How to Display Something for a Short Time |
Socket Server (Master/Slave Version) |
Socket Server (Producer/Consumer Version) |
|
Example 17-3 Producer/Consumer Socket Program |
Making a Native Call to pthread_setconcurrency() |
|
Example 17-4 Setting the Concurrency Level in Solaris (TimeDiskSetConc.java) |
Actual Implementation of POSIX Synchronization |
|
Example 17-5 Correct Implementation of Mutexes and Condition Variables |
A Robust, Interruptible Server |
|
Example 17-6 A Robust Server |
Disk Performance with Java |
|
Example 17-7 Measuring Disk Access Throughput |
Other Programs on the Web |
Summary |
Appendix A. Internet |
Threads Newsgroup |
Code Examples |
Vendor's Threads Pages |
Threads Research |
Freeware Tools |
Other Pointers |
The Authors on the Net |
Appendix B. Timings |
Timings |
Mutex Lock/Unlock |
Table C-1. Timings of Various Thread-Related Functions on POSIX and Java (?s) |
Explicit Synchronized |
Implicit Synchronized |
Readers/Writer Lock/Unlock |
Semaphore Post/Wait |
notify() |
condSignal() |
Local Context Switch (unbound) |
Local Context Switch (bound) |
Process Context Switch |
Cancellation Disable/Enable |
Test for Deferred Cancellation |
Reference a Global Variable |
Reference Thread-Specific Data |
Reference 'Fake' Thread-Specific Data |
Appendix C. APIs |
Function Descriptions |
The Class java.lang.Thread |
The Interface java.lang.Runnable |
The Class java.lang.Object |
The Class java.lang.ThreadLocal |
The Class java.lang.ThreadGroup |
Helper Classes from Our Extensions LibraryThe Class Extensions.InterruptibleThread |
The Class Extensions.Semaphore |
The Class Extensions.Mutex |
The Class Extensions.ConditionVar |
The Class Extensions.RWLock |
The Class Extensions.Barrier |
The Class Extensions.SingleBarrier |
Glossary |