Hardware Reference
In-Depth Information
For example, the 80x86 instruction POPF loads the flag registers from the top of the stack in
memory. One of the flags is the Interrupt Enable (IE) flag. Until recent changes to support vir-
tualization, running the POPF instruction in user mode, rather than trapping it, simply changed
all the flags except IE. In system mode, it does change the IE flag. Since a guest OS runs in user
mode inside a VM, this was a problem, as it would expect to see a changed IE. Extensions of
the 80x86 architecture to support virtualization eliminated this problem.
Historically, IBM mainframe hardware and VMM took three steps to improve performance
of virtual machines:
1. Reduce the cost of processor virtualization.
2. Reduce interrupt overhead cost due to the virtualization.
3. Reduce interrupt cost by steering interrupts to the proper VM without invoking VMM.
IBM is still the gold standard of virtual machine technology. For example, an IBM mainframe
ran thousands of Linux VMs in 2000, while Xen ran 25 VMs in 2004 [ Clark et al. 2004 ] . Recent
versions of Intel and AMD chipsets have added special instructions to support devices in a
VM, to mask interrupts at lower levels from each VM, and to steer interrupts to the appropri-
ate VM.
Coherency Of Cached Data
Data can be found in memory and in the cache. As long as the processor is the sole component
changing or reading the data and the cache stands between the processor and memory, there
is litle danger in the processor seeing the old or stale copy. As we will see, multiple processors
and I/O devices raise the opportunity for copies to be inconsistent and to read the wrong copy.
The frequency of the cache coherency problem is different for multiprocessors than I/O.
Multiple data copies are a rare event for I/O—one to be avoided whenever possible—but a
program running on multiple processors will want to have copies of the same data in several
caches. Performance of a multiprocessor program depends on the performance of the system
when sharing data.
The I/O cache coherency question is this: Where does the I/O occur in the computer—between
the I/O device and the cache or between the I/O device and main memory? If input puts data
into the cache and output reads data from the cache, both I/O and the processor see the same
data. The difficulty in this approach is that it interferes with the processor and can cause the
processor to stall for I/O. Input may also interfere with the cache by displacing some informa-
tion with new data that are unlikely to be accessed soon.
The goal for the I/O system in a computer with a cache is to prevent the stale data problem
while interfering as litle as possible. Many systems, therefore, prefer that I/O occur directly
to main memory, with main memory acting as an I/O buffer. If a write-through cache were
used, then memory would have an up-to-date copy of the information, and there would be no
stale data issue for output. (This benefit is a reason processors used write through.) Alas, write
through is usually found today only in first-level data caches backed by an L2 cache that uses
write back.
Input requires some extra work. The software solution is to guarantee that no blocks of the
input buffer are in the cache. A page containing the buffer can be marked as noncachable,
and the operating system can always input to such a page. Alternatively, the operating system
can flush the buffer addresses from the cache before the input occurs. A hardware solution is
to check the I/O addresses on input to see if they are in the cache. If there is a match of I/O
addresses in the cache, the cache entries are invalidated to avoid stale data. All of these ap-
proaches can also be used for output with write-back caches.
Search WWH ::




Custom Search