Digital Signal Processing Reference
In-Depth Information
Compiled Simulation
and efficient method to translate medium sized applications into a fast compiled
simulator. Their technique inspired all the later work.
Reshadi et al. compile only the instruction decoding part to achieve the flexibility
of an interpretive simulator while reaching a simulation speed which is close to
parts of the executable are compiled. The performance is between a factor of 2 and
16 slower than native code.
Errico et al. generate interpretive and both static and dynamic compiled simula-
run time which is compiled by GCC and dynamically loaded. Translation of aligned
code pages is done identically both for the static and the dynamic simulator.
Dynamically Compiled Simulation
the memory hierarchy, in particular the efficient simulation of the memory address
translation and memory protection mechanisms, as well as caches. Embra follows
a compile-only approach, i.e., simulated instructions are always translated to native
code and then executed. The translated code fragments for basic blocks are stored
in a translation cache to speed up the look-up of native code blocks during the
simulation. The basic simulator can be extended and adapted using customized
translations. For example, different cache configurations and coherency protocols
are realized using these customized translations.
at fast execution tracing. It offers a rich interface to trace and process events
during simulation. Similar to the customized translations in Embra, Shade allows
user-defined as well as pre-defined code to collect trace data. Trace collection
is controlled by
analyzers
that specify whether information should be considered
during tracing on a per-opcode or per-instruction basis. The tracing level, i.e., the
amount of data collected during simulation can be varied at runtime. In this way,
only critical portions of the program execution need be executed with full tracing.
Ebcio glu et al. present sophisticated code generation techniques to efficiently
tor optimizations are presented and combined with runtime statistics collection.
For example, instructions of the target processor are initially interpreted, and
compilation of
tree regions
is only triggered for hot paths of the simulated
program. Instruction-level parallelism is further improved by aggressive instruction