Memory Hierarchy Design - Computer Architecture: A Quantitative Approach

Hardware Reference

In-Depth Information

For example, the 80x86 instruction POPF loads the flag registers from the top of the stack in

memory. One of the flags is the Interrupt Enable (IE) flag. Until recent changes to support vir-

tualization, running the POPF instruction in user mode, rather than trapping it, simply changed

all the flags except IE. In system mode, it does change the IE flag. Since a guest OS runs in user

mode inside a VM, this was a problem, as it would expect to see a changed IE. Extensions of

the 80x86 architecture to support virtualization eliminated this problem.

Historically, IBM mainframe hardware and VMM took three steps to improve performance

of virtual machines:

1. Reduce the cost of processor virtualization.

2. Reduce interrupt overhead cost due to the virtualization.

3. Reduce interrupt cost by steering interrupts to the proper VM without invoking VMM.

IBM is still the gold standard of virtual machine technology. For example, an IBM mainframe

ran thousands of Linux VMs in 2000, while Xen ran 25 VMs in 2004 [ Clark et al. 2004 ] . Recent

versions of Intel and AMD chipsets have added special instructions to support devices in a

VM, to mask interrupts at lower levels from each VM, and to steer interrupts to the appropri-

ate VM.

Coherency Of Cached Data

Data can be found in memory and in the cache. As long as the processor is the sole component

changing or reading the data and the cache stands between the processor and memory, there

is litle danger in the processor seeing the old or stale copy. As we will see, multiple processors

and I/O devices raise the opportunity for copies to be inconsistent and to read the wrong copy.

The frequency of the cache coherency problem is different for multiprocessors than I/O.

Multiple data copies are a rare event for I/O—one to be avoided whenever possible—but a

program running on multiple processors will want to have copies of the same data in several

caches. Performance of a multiprocessor program depends on the performance of the system

when sharing data.

The I/O cache coherency question is this: Where does the I/O occur in the computer—between

the I/O device and the cache or between the I/O device and main memory? If input puts data

into the cache and output reads data from the cache, both I/O and the processor see the same

data. The difficulty in this approach is that it interferes with the processor and can cause the

processor to stall for I/O. Input may also interfere with the cache by displacing some informa-

tion with new data that are unlikely to be accessed soon.

The goal for the I/O system in a computer with a cache is to prevent the stale data problem

while interfering as litle as possible. Many systems, therefore, prefer that I/O occur directly

to main memory, with main memory acting as an I/O buffer. If a write-through cache were

used, then memory would have an up-to-date copy of the information, and there would be no

stale data issue for output. (This benefit is a reason processors used write through.) Alas, write

through is usually found today only in first-level data caches backed by an L2 cache that uses

write back.

Input requires some extra work. The software solution is to guarantee that no blocks of the

input buffer are in the cache. A page containing the buffer can be marked as noncachable,

and the operating system can always input to such a page. Alternatively, the operating system

can flush the buffer addresses from the cache before the input occurs. A hardware solution is

to check the I/O addresses on input to see if they are in the cache. If there is a match of I/O

addresses in the cache, the cache entries are invalidated to avoid stale data. All of these ap-

proaches can also be used for output with write-back caches.

Computer Architecture: A Quantitative Approach

Search WWH ::

Custom Search

Home