Information Technology Reference
In-Depth Information
There are several tools available to visualize and analyze such traces. View-
ers like LTTV (Linux Tracing Toolkit Viewer) 1 , TMF (Tracing and Monitoring
Framework, an Eclipse plugin part of Linux Tools) 2 , Jumpshot [ZLGS99] or Triva
[SHN09] give a graphical representation of different runtime aspects (cpu usage,
memory consumption, file accesses, critical path analysis, etc.) of the system
under study using the collected trace logs.
While it is being traced, a process, application or individual component may
have different execution states. For instance, the state of a process may be
changed subsequently between new, ready, waiting, running or dead. Eciently
managing the different state values for the different system attributes (the term
“attribute” is used here to describe the system resources and modules) and
making them quickly available to the administrators and monitoring tools at
any requested time, can be used to better comprehend the execution or track
system problems, as used in [CZG + 05, CGK + 04]. For example, suppose a prob-
lem or attack is detected, or a performance degradation is reported. In these
cases, having the states of the important system resources (e. g. what are the
running processes, what files are opened, which CPUs are scheduled, how many
bytes are read or written through network devices) at the reported time can help
administrators to understand better the problems and possibly to find their root
cause.
Re-reading and re-running the trace events or checkpoint method (Figure
1) can be used to manage the different state values of the system, as used in
TMF and LTTV viewers [Mon11]. However, by having a large number of system
attributes and a large trace duration (traces up to several hours and in terabyte
range), these solutions may not be ecient or scalable to extract the values of any
given attribute at any given time. As an example, assume a trace viewer aimed
to display a histogram of some metrics such as the number of interrupts (as a
defined system attribute) within a 1 TB trace. To do so, the viewer may extract
the values at 100 different points (corresponding to the number of available pixels
of the graphical view), and for each point reading of a 10 GB (= 1 TB / 100
points) length of the trace is required. However, rereading such a large section
of the trace would clearly be unacceptable for an interactive browsing. This is
where an ecient state history database may greatly help.
The main contribution of this work consists of a generic, scalable and ecient
tree-based state history data structure, and corresponding algorithms to store,
manage and retrieve the different state values for an arbitrary trace size, for
any number of system resources. The method works by building incrementally a
data structure to store the state history as it sequentially reads the trace events.
The state values, extracted from the trace events, are stored as intervals in this
state history database. The genericity comes from the fact that the method
does not hard-code the state definitions, neither in the viewer modules nor in
the tracer. This makes it possible to support different trace formats, and allows
defining different attributes, state variables and values. For scalability purposes,
1 http://lttng.org/LTTV
2 http://www.eclipse.org/linuxtools/projectPages/lttng/
Search WWH ::




Custom Search