Chapter 1. Introduction
Figure 1-1. Performance for Digital's Alpha Servers (8400 5/625)
Chapter 2. Concepts
Background: Traditional Operating Systems
Figure 2-1. Memory Layout for DOS-Style Operating Systems
Figure 2-2. Memory Layout for Multitasking Systems
Figure 2-3. Processes on a Multitasking System
What Is a Thread?
Figure 2-4. Relationship between a Process and Threads
Figure 2-5. Process Structure and Thread Structures
Kernel Interaction
Concurrency vs. Parallelism
Figure 2-6. Three Threads Running Concurrently on One CPU
Figure 2-7. Three Threads Running in Parallel on Three CPUs
System Calls
Signals
Synchronization
Scheduling
The Value of Using Threads
Parallelism
Figure 2-8. Different Threads Running on Different Processors
Throughput
Figure 2-9. Two Threads Making Overlapping System Calls
Responsiveness
Figure 2-10. Threads Overlapping Calculation and I/O
Communications
Figure 2-11. Different Clients Being Handled by Different Threads
System Resources
Distributed Objects
Figure 2-12. Distributed Objects Running on Distinct Threads
Same Binary for Uniprocessors and Multiprocessors
Program Structure
Figure 2-13. Simplified Flow of Control in Complex Applications
What Kinds of Programs to Thread
Inherently MT Programs
Independent tasks
Servers
Repetitive tasks
Not Obviously MT Programs
Numerical programs
Old code
Automatic Threading
Programs Not to Thread
What About Shared Memory?
Threads Standards
POSIX Threads
Win32 and OS/2 Threads
DCE Threads
Solaris Threads
Performance
Operating Systems
NFS
Figure 2-14. NFS Performance on MP Machines (SPEC '96)
SPECfp 95
Table 2-1. SPECfp95 Results for Alpha 4100 5/466 (SPEC '97)
SPECint_rate95
Figure 2-15. Running SPECrate_fp95 on an SGI Origin/200, 2000 (SPEC '96)
Java Benchmarks
Summary
Chapter 3. Foundations
Implementation vs. Specification
Thread Libraries
The Process Structure
Figure 3-1. Process Structure in Traditional UNIX and in Solaris 2
Lightweight Processes
Figure 3-6. Operation of a System Call
Digital UNIX
Linux
Threads and LWPs
Figure 3-2. Contents of a Thread
Figure 3-3. How the Threads Library Fits into a Process
Figure 3-4. How Java Is Built on Lower-Level Threads Libraries
The POSIX Multithreaded Model
Figure 3-5. POSIX Multithreaded Architecture
System Calls
Signals
Summary
Chapter 4. Lifecycle
Thread Lifecycle
Example 4-1 Simple Call to Create and Exit a POSIX Thread
Example 4-2 Simple Call to Create and Exit a Java Thread
Example 4-3 Simple Call to Create and Exit a Win32 Thread
Exiting a Thread
The Runnable Interface
Example 4-4 Simple Call to Run a Runnable in a Thread
Figure 4-1. Creating a Thread via a Runnable
Example 4-5 Defining run() via an Inner Class in a Thread
Waiting for Threads
Example 4-6 Waiting for Threads to Exit
Figure 4-2. Using thread.join()
Who Am I?
Example 4-7 Getting the Current Thread's Identity
Don't Wait for Threads, Don't Return Status
Exiting the Process
Suspending a Thread
Cancellation
Example 4-8 Cancellation in the Three Libraries
Figure 4-3. Cancellation
ThreadDeath
Garbage Collecting Threads
Zombies
Figure 4-4. Java Thread Create and Join
Is She Still Alive?
Restarting Threads
An Example: Create and Join
The Main Thread Is Different
Example 4-9 Java Create and Join
Example 4-10 Output for Code Example 4-9
Example 4-11 Construct and Start in a Single Line
APIs Used in This Chapter
The Class java.lang.Thread
The Class Extensions.InterruptibleThread
The Interface java.lang.Runnable
Summary
Chapter 5. Scheduling
Different Models of Kernel Scheduling
Many Threads on One LWP
One Thread per LWP
Many Threads on Many LWPs (Strict)
The Two-Level Model
Win32 Fibers
Thread Scheduling
Figure 5-1. The Two Basic Types of Scheduling
Process Contention Scope
Priority Levels
Scheduling States
Active:
Runnable:
Sleeping:
Suspended:
Zombie:
Figure 5-2. Some Process Contention Scope Threads in Various States
Figure 5-3. Simplified View of Thread State Transitions
System Contention Scope
Figure 5-4. Some System Contention Scope Threads in Various States
Context Switching
Figure 5-5. How a Context Switch Works
Figure 5-6. How a Context Switch Works
Preemption
How Many LWPs?
How to Get Those LWPs in Java
Changing Scheduling Parameters for LWPs
nice()
Realtime LWPs
Avoid Realtime
Allocation Domains
Binding LWPs to Processors
Happiness Is a Warm Cache
Figure 5-7. Processor Affinity
Java Scheduling Summary
How Many Threads in Java?
When Should You Care About Scheduling?
APIs Used in This Chapter
The Class java.lang.Thread
Summary
Chapter 6. Synchronization
Synchronization Issues
Example 6-1 Why Synchronization Is Necessary
Atomic Actions and Atomic Instructions
Example 6-2 Pseudo-assembly Code for the Mutual Exclusion Lock
Critical Sections
Lock Your Shared Data!
Synchronization Variables
Mutexes
Example 6-3 Using Mutexes in the Various Libraries
Figure 6-1. Mutex with Several Threads Sleeping on It
Figure 6-2. Execution Graph of the Operation of a Mutex
Figure 6-3. Protecting a Shared List with a Mutex
Example 6-4 Protecting a Shared List with a Mutex (POSIX)
Figure 6-4. All Objects Have Their Own Mutex and Wait Set
Example 6-5 Synchronized Statement
Example 6-6 Using synchronized in Java
Figure 6-5. Each Instance Has Its Own Mutex
Example 6-7 Static Synchronized Methods Also Use the Class Lock
Example 6-8 You May Use an Unrelated Object to Protect Static Data
Example 6-9 You May Use the Class Itself to Protect Static Data
Example 6-10 Protecting a Shared List with a Mutex (Java)
Semaphores
Example 6-11 Basic Use of Counting Semaphores
Figure 6-6. How a Semaphore Operates
Figure 6-7. Execution Graph of the Operation of a Semaphore
Example 6-12 Classic Producer/Consumer Example (one_queue_problem.c)
Example 6-13 Classic Producer/Consumer Example (OneQueueProblem)
Using Barriers to Count Exiting Threads
A Different View of Semaphores
Figure 6-8. Flowchart for Semaphores
Condition Variables
Figure 6-9. Flowchart for Condition Variables
Figure 6-10. Threads Using a Condition Variable
Example 6-14 Using a Condition Variable (POSIX)
Java wait/notify
Example 6-15 Using wait/notify (Java)
Extraneous Contention
Figure 6-11. Extra Contention: When the Mutex Is Held by a Different Thread
InterruptedException
Example 6-16 Using wait/notify with InterruptedException
Controlling the Queue Length
Example 6-17 Classic Producer/Consumer Model (with a Tiny Bug)
POSIX-Style Synchronization in Java
Example 6-18 Classic Producer/Consumer in POSIX
POSIX-Style Mutexes in Java
Example 6-19 Implementing POSIX-Style Mutexes in Java
POSIX-Style Condition Variables in Java
Example 6-20 Implementing Condition Variables in Java
Example 6-21 condWait() Done Wrong
Example 6-22 Producer/Consumer Model Using POSIX-Style Synchronization
A Stoppable Producer/Consumer Example
Example 6-23 Stoppable Producer/Consumer Model
Example 6-24 Stoppable Producer/Consumer Model (Stopper)
Example 6-25 Stoppable Producer/Consumer Model (Starting Up and Shutting Down in main()
APIs Used in This Chapter
The Class java.lang.Object
The Class Extensions.Semaphore
The Class Extensions.Mutex
The Class Extensions.ConditionVar
Summary
Chapter 7. Complexities
Complex Locking Primitives
Readers/Writer Locks
Figure 7-1. How Readers/Writer Locks Work
Figure 7-2. Execution Graph for Readers/Writer Locks
Example 7-1 Readers/Writer Locks in Java
Priority Inheritance Mutexes
Figure 7-3. Priority Inversion
FIFO Mutexes
Figure 7-4. When FIFO Mutexes Are Valuable
Recursive Mutexes
Example 7-2 Recursive Locking Calls in POSIX and Java
Nonblocking Synchronization
Spin Locks
Example 7-3 Simple Spin Lock
Adaptive Spin Locks
Java May Use Spin Locks
Timeouts
Elvis and the UFOs
Example 7-4 Recalculating Timeouts
Other Synchronization Variables
Join
Barriers
Figure 7-5. Barriers
Single Barriers
Figure 7-6. Single Barriers
Example 7-5 Implementing Single Barriers in Java
Win32 Event Objects
Win32 Critical Sections
Multiple Wait Semaphores
Interlocked Instructions
Message Queues
Win32 I/O Completion Ports
Communicating via Streams
Volatile
Performance
Condition Variables vs. wait/notify
Coarse vs. Fine Grain Locking
What to Lock
Figure 7-7. Friends/Enemies with Only One Local Mutex Lock
Double-Checked Locking
Example 7-6 Double-Checked Locking
Synchronization Problems
Deadlocks
Example 7-7 Deadlock in Java
Figure 7-8. Typical Deadlock
Example 7-8 Locking Mutexes Out of Order
Race Conditions
Example 7-9 Simplistic Race Condition
Recovering from Deadlocks
The Lost Wakeup
Example 7-10 The Lost Wakeup Problem
InterruptedException
Example 7-11 Ignoring InterruptedException
Example 7-12 Propagating InterruptedException
Example 7-13 Ignoring InterruptedException
APIs Used in This Chapter
The Class Extensions.RWLock
The Class Extensions.Barrier
The Class Extensions.SingleBarrier
Summary
Chapter 8. TSD
Thread-Specific Data
Figure . Thread-Specific Data in POSIX
Example 8-1 Use of POSIX TSD
Example 8-2 Dynamic TLS in Win32
Java TSD
Example 8-3 Implementing TSD by Subclassing Thread
Example 8-4 Using ThreadLocal in Java
Example 8-5 Recording the Runnable in the Thread
Example 8-6 Recording the Runnable in Thread Local Storage
APIs Used in This Chapter
The Class java.lang.ThreadLocal
Summary
Chapter 9. Cancellation
What Cancellation Is
Figure 9-1. Thread Cancellation
Example 9-1 Asynchronous Thread Cancellation
Polling for Cancellation
Asynchronous Cancellation
Deferred Cancellation
Using interrupt() for Deferred Cancellation
Progressive Shutdown
interrupt()
Example 9-2 Using thread.interrupt()
Don't Call stop()
ThreadDeath
Using stop() to Implement Thread.exit()
Example 9-3 Implementing exit()
Never Exit a Thread!
Don't Call destroy()
Example 9-4 From ThreadedSwing Example: NumericButtonListener.java
Defined Cancellation/Interruption Points
The Problem with I/O
Not Cancelling upon Interruption
Handling Interrupts
Disabling Interrupts
Example 9-5 Testing a Variable from an Exception Handler
Ignore Interrupts
Example 9-6 Inventing an InterruptDisabled Protocol
Exit on interrupt()
Propagate InterruptedException
Reinterrupt
Example 9-7 Calling interrupt() upon Return
Example 9-8 Naive Condition Variable and Readers/Writer Lock
Example 9-9 Handling Interruptions from condWait() the Hard Way
Example 9-10 The Right Way of Implementing condWait()
Cancellation State
A Cancellation Example
Figure 9-2. Cancellation Example Program
Example 9-11 Using interrupt() to Cancel Searcher Threads
Using Cancellation
Ensuring Bounded CPU Time
Example 9-12 Deferred Cancellation as Polling
Interrupting Computational Loops
Example 9-13 Testing Once Every 1000 Iterations
Interrupting Blocked Threads
Interrupting Sockets
What Should Throw InterruptedException?
Interrupting Sleeping Threads
Interruption in wait()
Example 9-14 Interrupting a wait()
The Morning After
Example 9-15 From the main() Method for Searcher Example
Cleanup
Example 9-16 How POSIX Cleanup Handlers Are Used
Example 9-17 How InterruptedException Handlers Clean Up
Implementing enableInterrupts()
Example 9-18 Implementing enableInterrupts()
A Cancellation Example (Improved)
Example 9-19 Cleaner Version of doDatabaseThing()
Simple Polling
Example 9-20 1Implementing the Searcher with Polling
APIs Used in This Chapter
The Class java.lang.Thread
The Class Extensions.InterruptibleThread
Summary
Chapter 10. Details
Thread Groups
Thread Security
Example 10-1 Checking for Security Violations
Example 10-2 A Simple Security Exception
Table 10-1. Thread Class Methods That May Cause a Security Check
Table 10-2. ThreadGroup Class Methods That May Cause a Security Check
Real-World Examples
1. The garbage collector thread and the finalize() method
2. Performance: Access to synchronized objects
3. More problems with the garbage collection thread
4. Make synchronized code sections as small as possible
Example 10-3 Synchronizing Part of a Method>
5. Threaded class downloads
General Tips and Hints
Daemon Threads
Daemon Thread Groups
Calling Native Code
Example 10-4 Locking Monitors from C Code
Figure 10-1. Java Thread Objects Use Native Threads
A Few Assorted Methods
Stack Size
Deprecated Methods
The Effect of Using a JIT
Adaptive Compilers
APIs Used in This Chapter
The Class java.lang.Thread
The Class java.lang.ThreadGroup
Summary
Chapter 11. Libraries
The Native Threads Libraries
Multithreaded Kernels
Figure 11-1. Concurrency within the Kernel
Symmetric Multiprocessing
Are Libraries Safe?
Figure 11-2. Using pread() and pwrite() to Keep Track of the File Pointer
Window Systems
Figure 11-3. Threads Using invokeLater() with the Swing Toolkit
Figure 11-4. ThreadedSwing Window Example
Example 11-1 Using Threads in Swing
Working with Unsafe Libraries
Example 11-2 Protecting a HashMap
Example 11-3 Subclassing an Unsafe Object
When Should a Class Be Synchronized?
Synchronized Collections in Java 2
Example 11-4 Making Synchronized Collections
Example 11-5 Protecting an Iterator
Java's Multithreaded Garbage Collector
Locks during Finalization
Summary
Chapter 12. Design
Making Libraries Safe and Hot
Trivial Library Functions
Example 12-1 Simple MT-Safe Implementation of rand(), Version 1
Functions That Maintain State across Invocations
Example 12-2 Implementing rand() with TSD, Version 2
Making malloc() More Concurrent
Figure 12-1. Current Solaris Implementation of malloc()
Using Thread-Specific Data to Make malloc() More Concurrent
Figure 12-2. Threads with Individual TSD malloc() areas
Using Other Methods to Make malloc() More Concurrent
Figure 12-3. Threads Using an Array of malloc() Areas.
Manipulating Lists
Figure 12-4. Friends/Enemies: Basic Design
Basic Design
Single, Global Mutex
Figure 12-5. Friends/Enemies: Global Mutex Lock
Example 12-3 Giving Friends Raises (from FriendThread.java)
Global RWLock with Global Mutex to Protect Salaries
Figure 12-6. Friends/Enemies: Global RWlock and Salary Lock
Example 12-4 : giveRaise() (listGlobaRW.java)
Example 12-5 Removing an Element from the List (ListGlobalRW2.java)
Global RWLock with Local Mutex to Protect Salaries
Figure 12-7. Friends/Enemies: Global RWlock and Local Salary Lock
One Local Lock
Figure 12-8. Friends/Enemies with Only One Local Mutex Lock
Example 12-6 Searching Code (ListLocalLock.java)
Two Local Locks
Figure 12-9. Friends/Enemies: Two Local Locks
Local RWLock with Local Mutex to Protect Salaries
Figure 12-10. Friends/Enemies: Local Lock and RWlock
Program Design
Master/Slave
Client/Server (Thread per Request)
Example 12-7 Master/Slave Socket Design
Producer/Consumer
Example 12-8 Producer/Consumer Socket Design
Pipeline
Example 12-9 Pipeline Design
Client/Server (Thread per Client)
Example 12-10 Thread per Client Design
Design Patterns
Summary
Chapter 13. RMI
Remote Method Invocation
Figure 13-1. A Simple RMI Call Sending and Receiving a String
Sending Remote References
Figure 13-2. A More Complex RMI Call Sending a Remote Object Reference
Example 13-1 Simple RMI Server and Client
Example 13-2 Running ServerRMI
RMI's Use of Threads
The Deadlock Problem with RMI
Figure 13-3. Deadlock by Remote Callback
Remote Garbage Collection
Summary
Chapter 14. Tools
Static Lock Analyzer
Using a Thread-Aware, Graphical Debugger
Figure 14-1. Sun's Debugger [Program Stopped in sleep()]
Proctool
Figure 14-2. Proctool, Main Display Window
Figure 14-3. Proctool, LWP Display Window
TNFview
Figure 14-4. Data Collection for TNF
Figure 14-5. Main Data Display Window for TNF
Figure 14-6. Histogram Display Window for TNF
Example 14-1 Code Using TNF Probes in Java
Summary
Chapter 15. Performance
Optimization: Objectives and Objections
Time to Market
Available Human Resources and Programming Costs
Portability
User Perception
Competition
Targeted Machine Configuration
Algorithm
CPU Time, I/O Time, Contention, Etc.
CPU
Memory Latency
Memory Bandwidth
I/O Latency
Contention
Throughput vs. Latency
Figure 15-1. NFS Throughput vs. Latency on Some SGI Machines
Limits on Speedup
Figure 15-2. Parallel Speedup on Several Numerical Programs
Figure 15-3. Program Behavior for Parallelized Benchmarks
Superlinear Speedup
Timing Threaded and Nonthreaded Programs
Amdahl's Law
Figure 15-4. Amdahl's Law: Time(total) = Time(serial) + Time(parallel) / Number_of_CPUs
Figure 15-5. TPC-C Performance of a Sun UE6000
Performance Bottlenecks
Figure 15-6. Performance Bottlenecks and Capacities of Programs
Benchmarks and Repeatable Testing
Is It Really Faster?
Table 15-1. Runtimes for Four Trials
Table 15-2. Runtimes for Ten Trials
General Performance Optimizations
Best Algorithm
Compiler Optimization
C Compiler Optimization
Java Compiler Optimization
Buy Enough RAM
Minimize I/O
Minimize Cache Misses
Any Other Loop Optimizations
Thread-Specific Performance Optimizations
Reducing Contention
Minimizing MT Overhead
Reducing Paging
Figure 15-7. Using Threads to Optimize Paging
Communications Bandwidth
Right Number of Threads
Short-Lived Threads
Dealing with Many Open Sockets
The Lessons of NFS
Figure 15-8. NFS Throughput on a Series of Sun UE Machines (The performance improvement is somewhat exaggerated, as a two-way UE6000 will outperform a two-way UE 2.)
Summary
Chapter 16. Hardware
Types of Multiprocessors
Shared Memory Symmetric Multiprocessors
The CPU
The System
Figure 16-1. SMP System Architecture
Store Barriers
Bus Architectures
Direct-Switched Buses
Figure 16-2. Direct-Switched Memory Bus
Packet-Switched Buses
Figure 16-3. Packet-Switched Memory Bus
Crossbar Switches
Figure 16-4. Cluster Using a Crossbar Switch
Hierarchical Interconnects
Figure 16-5. Hierarchical Design of the SGI Origin Series
ccNUMA
Packet-Switched Buses and ldstub
Figure 16-6. Packet-Switched Memory Bus Running ldstub
Example 16-1 Spin Locks Done Better
The Thundering Herds
LoadLocked/StoreConditional and Compare and Swap
Example 16-2 Atomic Increment Using LoadLocked and StoreConditional
Figure 16-7. SMP System Architecture
Lock-Free Semaphores and Reference Counting
Volatile: The Rest of the Story
Atomic Reads and Writes
Interlocked Instructions
Memory Systems
Reducing Cache Misses
Table 16-1. Selected SPEC Benchmarks for Two UE 3500s
Cache Blocking
Data Reorganization
Word Tearing
False Sharing
Example 16-3 False Sharing
Summary
Chapter 17. Examples
Threads and Windows
Example 17-1 ThreadedSwing Program
Figure 17-1. ThreadedSwing Window Example
Displaying Things for a Moment (Memory.java)
Figure 17-2. The Memory Game
Example 17-2 How to Display Something for a Short Time
Socket Server (Master/Slave Version)
Socket Server (Producer/Consumer Version)
Example 17-3 Producer/Consumer Socket Program
Making a Native Call to pthread_setconcurrency()
Example 17-4 Setting the Concurrency Level in Solaris (TimeDiskSetConc.java)
Actual Implementation of POSIX Synchronization
Example 17-5 Correct Implementation of Mutexes and Condition Variables
A Robust, Interruptible Server
Example 17-6 A Robust Server
Disk Performance with Java
Example 17-7 Measuring Disk Access Throughput
Other Programs on the Web
Summary
Appendix A. Internet
Threads Newsgroup
Code Examples
Vendor's Threads Pages
Threads Research
Freeware Tools
Other Pointers
The Authors on the Net
Appendix B. Timings
Timings
Mutex Lock/Unlock
Table C-1. Timings of Various Thread-Related Functions on POSIX and Java (?s)
Explicit Synchronized
Implicit Synchronized
Readers/Writer Lock/Unlock
Semaphore Post/Wait
notify()
condSignal()
Local Context Switch (unbound)
Local Context Switch (bound)
Process Context Switch
Cancellation Disable/Enable
Test for Deferred Cancellation
Reference a Global Variable
Reference Thread-Specific Data
Reference 'Fake' Thread-Specific Data
Appendix C. APIs
Function Descriptions
The Class java.lang.Thread
The Interface java.lang.Runnable
The Class java.lang.Object
The Class java.lang.ThreadLocal
The Class java.lang.ThreadGroup
Helper Classes from Our Extensions LibraryThe Class Extensions.InterruptibleThread
The Class Extensions.Semaphore
The Class Extensions.Mutex
The Class Extensions.ConditionVar
The Class Extensions.RWLock
The Class Extensions.Barrier
The Class Extensions.SingleBarrier
Glossary