Oracle VM Server for SPARC - Oracle Solaris 10 System Virtualization Essentials

Information Technology Reference

In-Depth Information

instructions that would change the shared physical machine's state (such as inter-

rupt masks, memory maps, and other parts of the system environment), thereby

violating the integrity of separation between guests. This process is complex and

creates CPU overhead. Context switches between virtual machines can require

hundreds or even thousands of clock cycles. Each context switch to a different

virtual machine requires purging cache and translation lookaside buffer (TLB)

contents because identical virtual memory addresses refer to different physical

locations. This scheme increases memory latency until the caches become filled

with fresh content, only to be discarded when the next time slice occurs.

In contrast, Logical Domains is designed for and leverages the chip multithread-

ing (CMT) UltraSPARC T1, T2, and T2 Plus processors. These processors provide

many CPU threads, also called strands , on a single processor chip. Specifically,

the UltraSPARC T1 processor provides 8 processor cores with 4 threads per core,

for a total of 32 threads on a single processor. The UltraSPARC T2 and T2 Plus

processors provide 8 cores with 8 threads per core, for a total of 64 threads per

chip. From the Oracle Solaris perspective, each thread is a CPU. This arrange-

ment creates systems that are rich in dispatchable CPUs, which can be allocated

to domains for their exclusive use.

Logical Domains technology assigns each domain its own CPUs, which are used

with native performance. This design eliminates the frequent context switches

that traditional hypervisors must implement to run multiple guests on a CPU and

to intercept privileged operations. Because each domain has dedicated hardware

circuitry, a domain can change its state—for example, by enabling or disabling

interrupts—without causing a trap and emulation. The assignment of strands to

domains can save thousands of context switches per second, especially for work-

loads with high network or disk I/O activity. Context switching still occurs within

a domain when Solaris dispatches different processes onto a CPU, but this is iden-

tical to the way Solaris runs on a non-virtualized server.

One mechanism that CMT systems use to enhance processing throughput is de-

tection of a cache miss, followed by a hardware context switch. Modern CPUs use

onboard memory called a cache —a very high-speed memory that can be accessed

in just a few clock cycles. If the needed data is present in memory but is not in this

CPU's cache, a cache miss occurs and the CPU must wait dozens or hundreds of

clock cycles on any system architecture. In essence, the CPU affected by the cache

miss stalls until the data is fetched from RAM to cache. On most systems, the CPU

sits idle, not performing any useful work. On those systems, switching to a differ-

ent process would require a software context switch that consumes hundreds or

thousands of cycles.

In contrast, CMT processors avoid this idle waiting by switching execution to

another CPU strand on the same core. This hardware context switch happens in

a single clock cycle because each hardware strand has its own private hardware

Search WWH ::

Custom Search

Home