Digital Signal Processing Reference
In-Depth Information
reached, control is returned to the simulator to resume the interpretive mode of
execution. To avoid unnecessary switches back into interpretive mode, branches in
already translated code are updated whenever a new block is translated. In the case
of self-modifying code or dynamic code loading/generation within the simulated
program, the translation cache has to be invalidated accordingly. Detecting memory
updates that overlap with translated code regions may cause significant overhead.
4.2
Optimizations
Since compilation time is included in the simulation time, only a few important
optimizations are performed during dynamic code generation. The most important
is register allocation. Registers and other hardware resources of the simulated
architecture are usually stored in memory. At the beginning of a basic block or a
trace the state of the simulated processor is partially loaded into registers. Similarly,
modified values are saved upon leaving the compiled code. The register allocation
strategy typically follows some simple heuristics. For example, some registers of
the simulated processor may be mapped to registers of the host computer without
performing any analysis of the code in order to reduce compilation overhead.
Processor simulation often involves the update of status registers or other infor-
mation which are later overwritten by subsequent simulated instructions without any
intervening uses. Liveness analysis can detect whether such computed values are
actually used by a subsequent instruction. For basic blocks and traces, this analysis
is very fast. The gathered information may be used to avoid useless computations
and should lead to significant performance improvements.
Better code can be generated when regions of code which contain loops are
optimized and translated to machine code. However, due to the more complex
control flow, optimizing these regions requires more time. Therefore, only the
most frequently executed code regions are considered for this translation method.
For example, a mixed-mode simulator can be used with three different levels of
optimization. Seldomly executed code is interpreted, moderately executed code is
optimized at a basic block level, and very frequently executed code is optimized at
aregionlevel [ 10 ] .
5
Parallel Simulation
The advent of modern multi-core processors with 8, 16 or even more computing
cores poses several interesting questions as to how an instruction set should be
simulated. One question concerns how the increased computing power of additional
cores should be exploited to speed up the time-consuming simulation. In addition,
DSP cores are more and more integrated with other computing resources, forming
complex multi-core architectures or even Multi-Processor System-On-Chips (MP-
Search WWH ::




Custom Search