Information Technology Reference
In-Depth Information
section can be used to extend profiling capabili-
ties into a VM environment.
The decision to choose a particular profiling
technique depends upon application requirements.
The following criteria are useful to decide which
approach is appropriate for a given application.
hardware-baSed profiling
techniqueS
Previous sections have concentrated on modifica-
tions to program code (e.g., via instrumentation) or
code that implements the execution environment
(e.g., VM profiling). This section describes hard-
ware profiling tech niques that collect behavioral
information in multithreaded systems, focusing on
two main categories of hardware-based profiling
solutions: on-chip performance counters and on-
chip debugging/in-circuit emulation interfaces.
Sampling is most effective when there is a
need to minimize runtime overhead and use
profiling in produc tion deployments, though
application-specific logical events may not
be tracked properly.
The simplest way to implement profiling is
by using the JVMTI/CLR profiling inter-
face, which has the shortest development
time and is easy to master. Detailed logical
events may not be captured, however, and
the overhead incurred may be heavier than
bytecode/IL instrumentation.
on-chip performance counters
On-chip debugging/profiling interfaces are spe-
cialized circuitries that are added to a microproces-
sor to collect events and measure time. Modern
COTS processors provide on-chip performance
monitoring and debugging support. On-chip,
performance-monitoring support includes select-
able counting registers and time stamping clocks.
The In tel Pentium/Xeon family of processors and
the IBM PowerPC family of processors both pro-
vide these performance monitoring features (Intel
Corporation, 2006a; IBM Corporation, 1998).
For example, the Intel Xeon processor pro-
vides one 64-bit timestamp counter and eighteen
40-bit-wide Model Spe cific Registers (MSR) as
counters (different processor models have a dif-
ferent number of performance counters available).
Each core (in a multicore configuration) has its
own timestamp counter and counter registers.
The timestamp counter is incremented at the
processor's clock speed and is constant (at least
in later versions of the processors) across multiple
cores and processors in an SMP environment.
Timestamp counters are initially synchronized
because each is started on the processor RESET
signal. Timestamp counters can be written to
later, however, potentially get them out of sync.
Counters must be carefully synchronized when
accessing them from different threads that po-
tentially execute on different cores.
Bytecode/IL instrumentation is harder to
implement, but gives unlimited freedom to
the profiler to record any event in the ap-
plication. Implementing a profiler is harder
than using the JVMTI/CLR profiling inter-
face, however, and a detailed knowledge of
bytecode/IL is required. Among the different
bytecode/IL instrumentation ways, com-
plexity of implementation increases from
static-time instrumentation to load-time to
dynamic instrumentation. Dynamic instru-
mentation provides powerful features, such
as “fix and continue” and runtime problem
tracking.
The use of an AOP framework can reduce
the development complexity and increase
reliability because bytecode/IL need not be
manipulated directly. Conversely, AOP can
increase design and deployment overhead,
which may make it unsuitable for profiling.
Moreover, application-level events may be
hard to capture using AOP if join-points
locations are limited.
Search WWH ::




Custom Search