Database Reference
In-Depth Information
improves performance due to totally avoiding emulation, but at the expense of modi-
fying guest OSs. In contrast, full virtualization avoids modifying guest OSs, but
at the expense of degrading system performance. As examples, VMWare uses full
virtualization [60] while Xen employs paravirtualization [9,47].
16.6.3 e mulation
Now that we understand the conditions for virtualizing ISAs and the two main
classes of CPU virtualization, full virtualization, and paravirtualization, we move
on to discussing emulation as being a major technique for implementing full virtual-
ization and process VMs. Emulation has been introduced in Section 16.5.1. To recap,
emulation is the process of allowing the interfaces and functionalities of one system
(the source) to be implemented on a system with different interfaces and function-
alities (the target). Emulation is the only CPU virtualization mechanism available
when the guest and host ISAs are different. If the guest and host ISAs are identical,
direct native execution can be possibly applied.
Emulation is carried out either via interpretation or binary translation . With
interpretation, source instructions are converted to relevant target instructions, one
instruction at a time. Interpretation is relatively slow because of emulating instruc-
tions one by one and not applying any optimization technique (e.g., avoiding the
interpretation of an already encountered and interpreted instruction). Binary trans-
lation optimizes upon interpretation by converting blocks of source instructions to
target instructions and caching generated blocks for repeated use. Typically, a block
of instructions is more amenable to optimizations than a single instruction. As com-
pared with interpretation, binary translation is much faster because of applying block
caching as well as code optimizations over blocks.
There are three major interpretation schemes, decode-and-dispatch , indirect-
threaded , and direct-threaded [55]. Basically, an interpreter should read through the
source code instruction by instruction, analyze each instruction, and call relevant
routines to generate the target code. This is actually what the decode-and-dispatch
interpreter does. Figure 16.15 exhibits a snippet of code for a decode-and-dispatch
interpreter used for interpreting the PowerPC ISA. As shown, the interpreter is
structured around a central loop and a switch statement. Each instruction is first
decoded (i.e., the extract () function) and subsequently dispatched to a corresponding
routine, which in return performs the necessary emulation. Clearly, such a decode-
and-dispatch strategy results in a number of direct and indirect branch instructions.
Specifically, an indirect branch for the switch statement, a branch to an interpreter
routine, and a second indirect branch to return from the interpreter routine will be
incurred per each instruction. Furthermore, with decode-and-dispatch, every time
the same instruction is encountered, its respective interpreter routine is invoked.
This, alongside of excessive branches, tend to greatly degrade performance.
As an optimization over decode-and-dispatch, the indirect-threaded interpreter
attempts to escape some of the decode-and-dispatch branches by appending (or
threading ) a portion of the dispatch code to the end of each interpreter routine
[20,32,34]. This precludes most of the branches incurred in decode-and-dispatch,
yet keeps invoking an interpreter routine every time the same instruction is decoded.
Search WWH ::




Custom Search