Dynamic Analysis and Profiling of Multithreaded Systems - Advanced Operating Systems and Kernel Applications

Information Technology Reference

In-Depth Information

The unit of logical flow within a running pro-

gram is a thread. Although the exact definition of

a thread can vary, threads are typically defined

as a lightweight representation of execution state.

The underlying kernel data structure for a thread

includes the address of the run-time stacks, prior-

ity information, and scheduling status. Each thread

belongs to a single process (a process requires at

least one thread). Processes define initial code

and data, a private virtual address space, and

state relevant to active system resources (e.g.,

files and semaphores). Threads that belong to

the same process share the same virtual address

space and other system resources. There is no

memory protection between threads in the same

process, which makes it easy to exchange data

efficiently between threads. At the same time,

however, threads can write to many parts of the

process' memory. Data integrity can be quickly

lost, therefore, if access to shared data by indi-

vidual threads is not controlled carefully.

Threads have traditionally been used on single

processor systems to help programmers implement

logically concurrent tasks and manage multiple

activities within the same program (Rinard,

2001). For example, a program that handles both

GUI events and performs network I/O could be

implemented with two separate threads that run

within the same process. Here the use of threads

avoids the need to “poll” for GUI and packet I/O

events. It also avoids the need to adjust priorities

and preempt running tasks, which is instead per-

formed by the operating system's scheduler.

With the recent advent of multicore and sym-

metric multiprocessor (SMP) systems, threads

represent logically concurrent program functions

that can be mapped to physically parallel process-

ing hardware. For example, a program deployed

on a four-way multicore processor must provide

at least four independent tasks to fully exploit

the available resources (of course it may not get

a chance to use all of the processing cores if they

are occupied by higher priority tasks). As parallel

processing capabilities in commodity hardware

grow, the need for multithreaded programming has

increased because explicit design of parallelism

in software is now key to exploiting performance

capabilities in next-generation processors (Sutter,

2005).

This chapter reviews key techniques and

methodologies that can be used to collect thread-

behavior information from running systems. We

highlight the strengths and weaknesses of each

technique and lend insight into how they can be

applied from a practical perspective.

understanding multithreaded

System behavior

Building large-scale software systems is both

an art and an engineering discipline. Software

construction is an inherently iterative process,

where system architects and developers iterate

between problem understanding and realiza-

tion of the solution. A superficial understanding

of behavior is often insufficient for production

systems, particularly mission-critical systems

where performance is tightly coupled to varia-

tions in the execution environment, such as load

on shared resources and hardware clock speeds.

Such variations are common in multithreaded

systems where execution is affected directly by

resource contention arising from other programs

executing at the same time on the same platform.

To build predictable and optimized large-scale

multithreaded systems, therefore, we need tools

that can help improve understanding of software

subsystems and help avoid potential chaotic ef-

fects that may arise from their broader integration

into systems.

Multithreaded programs are inherently com-

plex for several reasons (Lee, 2006; Sutter &

Larus, 2005), including: (1) the use of nondeter-

ministic thread scheduling and pre-emption; and

(2) control and data dependencies across threads.

Most commercial-off-the-shelf (COTS) operating

systems use priority queue-based, preemptive

thread scheduling. The time and space resources

Advanced Operating Systems and Kernel Applications

Search WWH ::

Custom Search

Home