Dynamic Analysis and Profiling of Multithreaded Systems - Advanced Operating Systems and Kernel Applications

Information Technology Reference

In-Depth Information

The performance counters and timestamp

MSRs are accessed through specialized machine

instructions (i.e., RDMSR , WRMSR , and RDTSC )

or through higher-level APIs such as the Perfor-

mance Application Programming Interface (PAPI)

(London, Moore, Mucci, Seymour, & Luczak,

2001). A set of control registers are also provided

to select which of the available performance

monitoring events should be maintained in the

available counter set. The advantages of using

on-chip performance counters are: (1) they do

not cost anything in addition to the off-the-shelf

processor and (2) they can be used with a very

low overhead. For instance, copying the current

64-bit timestamp counter into memory (user or

kernel) through the Intel RDTSC instruction costs

less than 100 cycles.

Countable events on the Intel Xeon processor

include branch predictions, prediction misses,

misaligned memory references, cache misses

and transfers, I/O bus transactions, memory bus

transactions, instruction decoding, micro-op

execution, and floating-point assistance. These

events are counted on a per-logical core basis, that

is, the Intel performance counter features do not

provide any means of differentiating event counts

across different threads or processes. Certain ar-

chitectures, however, such as the IBM PowerPC

604e (IBM Corporation, 1998), do provide the

ability to trigger an interrupt when performance

counters negate or wrap-around. This interrupt

can be fil tered on a per processor basis and used

to support a crude means of thread-association

for infrequent events.

On-chip performance counters have limited

use in profiling characteristics specific to multi-

threaded programming. Nevertheless, on-chip

timestamp collection can be useful for measur-

ing execution time intervals (Wolf, 2003). For

example, measurement of context switch times of

the operating systems can be easily done through

the insertion of RDTSC into the operating system-

kernel switching code. Coupling timestamp

features with compiler-based instrumentation

can be an effective way to measure lock wait

and hold times.

on-chip debugging interfaces and

in-circuit emulators (ice)

Performance counters are only useful for counting

g l o b a l e v e in t s i in t h e s y s t e m . A d d i t i o in a l f u in c t i o in a l -

ity is therefore needed to perform more powerful

inspection of execution and register/memory

state. One way to provide this functionality is by

augmenting the “normal” target processor with ad-

ditional functionality. The term in-circuit emulator

(ICE) refers to the use of a substitute processor

module that “emulates” the target microprocessor

and provides additional debugging functionality

(Collins 1997).

ICE modules are usually plugged directly into

the microprocessor socket using a specialized

adapter, as shown in Figure 12. Many modern

microprocessors, however, provide explicit sup-

port for ICE, including most x86 and PowerPC-

based CPUs. A special debug connector on the

motherboard normally provides access to the

on-chip ICE features.

Two key standards define debugging function-

ality adopted by most ICE solutions: JTAG (IEEE,

2001) and the more recent Nexus (IEEE-ISTO,

2003). The Nexus debugging interface is a super-

set of JTAG and consists of between 25 and 100

auxiliary message-based channels that connect

directly to the target processor. The Nexus speci-

fication defines a number of different “classes”

of support that represent different capability sets

composed from the following sets:

•

Ownership trace messaging (OTM), which

facilitates ownership tracing by providing

visibility of which process identity (ID)

or operating system task is activated. An

OTM is transmitted to indicate when a new

process/task is activated, thereby allowing

development tools to trace ownership flow.

For embedded processors that implement

Advanced Operating Systems and Kernel Applications

Search WWH ::

Custom Search

Home