Information Technology Reference
In-Depth Information
instructions that would change the shared physical machine's state (such as inter-
rupt masks, memory maps, and other parts of the system environment), thereby
violating the integrity of separation between guests. This process is complex and
creates CPU overhead. Context switches between virtual machines can require
hundreds or even thousands of clock cycles. Each context switch to a different
virtual machine requires purging cache and translation lookaside buffer (TLB)
contents because identical virtual memory addresses refer to different physical
locations. This scheme increases memory latency until the caches become filled
with fresh content, only to be discarded when the next time slice occurs.
In contrast, Logical Domains is designed for and leverages the chip multithread-
ing (CMT) UltraSPARC T1, T2, and T2 Plus processors. These processors provide
many CPU threads, also called strands , on a single processor chip. Specifically,
the UltraSPARC T1 processor provides 8 processor cores with 4 threads per core,
for a total of 32 threads on a single processor. The UltraSPARC T2 and T2 Plus
processors provide 8 cores with 8 threads per core, for a total of 64 threads per
chip. From the Oracle Solaris perspective, each thread is a CPU. This arrange-
ment creates systems that are rich in dispatchable CPUs, which can be allocated
to domains for their exclusive use.
Logical Domains technology assigns each domain its own CPUs, which are used
with native performance. This design eliminates the frequent context switches
that traditional hypervisors must implement to run multiple guests on a CPU and
to intercept privileged operations. Because each domain has dedicated hardware
circuitry, a domain can change its state—for example, by enabling or disabling
interrupts—without causing a trap and emulation. The assignment of strands to
domains can save thousands of context switches per second, especially for work-
loads with high network or disk I/O activity. Context switching still occurs within
a domain when Solaris dispatches different processes onto a CPU, but this is iden-
tical to the way Solaris runs on a non-virtualized server.
One mechanism that CMT systems use to enhance processing throughput is de-
tection of a cache miss, followed by a hardware context switch. Modern CPUs use
onboard memory called a cache —a very high-speed memory that can be accessed
in just a few clock cycles. If the needed data is present in memory but is not in this
CPU's cache, a cache miss occurs and the CPU must wait dozens or hundreds of
clock cycles on any system architecture. In essence, the CPU affected by the cache
miss stalls until the data is fetched from RAM to cache. On most systems, the CPU
sits idle, not performing any useful work. On those systems, switching to a differ-
ent process would require a software context switch that consumes hundreds or
thousands of cycles.
In contrast, CMT processors avoid this idle waiting by switching execution to
another CPU strand on the same core. This hardware context switch happens in
a single clock cycle because each hardware strand has its own private hardware
 
Search WWH ::




Custom Search