Digital Signal Processing Reference
In-Depth Information
Compiled Simulation
Mills et al. were the first to use compiled simulation [ 45 ] . They presented a simple
and efficient method to translate medium sized applications into a fast compiled
simulator. Their technique inspired all the later work.
Reshadi et al. compile only the instruction decoding part to achieve the flexibility
of an interpretive simulator while reaching a simulation speed which is close to
compiled simulation [ 58 ] .
SyntSim is a generator for functional simulators [ 11 ] . Using profile information,
parts of the executable are compiled. The performance is between a factor of 2 and
16 slower than native code.
Errico et al. generate interpretive and both static and dynamic compiled simula-
tors from a simulator specification [ 25 ] . The dynamic simulator generates C code at
run time which is compiled by GCC and dynamically loaded. Translation of aligned
code pages is done identically both for the static and the dynamic simulator.
Dynamically Compiled Simulation
SimOS [ 61 ] is a full-system simulator that offers various simulators including
Embra [ 73 ] , a fast dynamic translator. Embra focuses on the efficient simulation of
the memory hierarchy, in particular the efficient simulation of the memory address
translation and memory protection mechanisms, as well as caches. Embra follows
a compile-only approach, i.e., simulated instructions are always translated to native
code and then executed. The translated code fragments for basic blocks are stored
in a translation cache to speed up the look-up of native code blocks during the
simulation. The basic simulator can be extended and adapted using customized
translations. For example, different cache configurations and coherency protocols
are realized using these customized translations.
Shade [ 18 ] is another dynamically compiling simulator that aims primarily
at fast execution tracing. It offers a rich interface to trace and process events
during simulation. Similar to the customized translations in Embra, Shade allows
user-defined as well as pre-defined code to collect trace data. Trace collection
is controlled by analyzers that specify whether information should be considered
during tracing on a per-opcode or per-instruction basis. The tracing level, i.e., the
amount of data collected during simulation can be varied at runtime. In this way,
only critical portions of the program execution need be executed with full tracing.
Ebcio glu et al. present sophisticated code generation techniques to efficiently
generate code for parallel processors at runtime [ 21 , 23 ] . Several code genera-
tor optimizations are presented and combined with runtime statistics collection.
For example, instructions of the target processor are initially interpreted, and
compilation of tree regions is only triggered for hot paths of the simulated
program. Instruction-level parallelism is further improved by aggressive instruction
Search WWH ::




Custom Search