Hardware Reference
In-Depth Information
The second application program on the RP-2 visualizes the power-saving mechanisms
for multiple cores using Linux. Two mechanisms are implemented in Linux. One
is dynamic voltage and frequency scaling (DVFS) of multiple cores, and the other
is dynamic plugging or unplugging of each CPU core. The two mechanisms
are controlled by the newly introduced “power control manager” daemon. The third
application program performs image processing of magnetic resonance imaging
(MRI) images using the RP-X chip. These three applications are described in
detail below.
6.3.1
Load Balancing on RP-1
6.3.1.1
Introduction
The RP-1 chip has four SH-4A cores. The main memory is shared by the four cores.
A pair of caches—an instruction cache and an operand cache—is placed between
each core and the main memory. Each operand cache is kept coherent with the other
operand caches using the directory-based write invalidation cache coherency protocol.
The write invalidation protocol is either the MESI cache coherency protocol or the
MSI cache coherency protocol. Eight channels of the inter-CPU interrupts (ICIs)
are implemented. The communication between cores inside Linux is mapped to one
or more channels of the ICIs. An interrupt caused by an event outside a core can
be either bound to a specific core or distributed to an arbitrary core so that the core
that receives an interrupt first serves the interrupt.
Some problems in scalability have been found with multiple processors in Linux
2.4. The problems have become obvious as multi-thread application programs have
become popular. Even on a single processor, the scheduler in Linux 2.4 runs in O(n)
time, where n is the size of the run queue. In symmetric multiprocessing (SMP) on
Linux 2.4, there is a single global run queue protected by a global spinlock. Only
one processor that has acquired the global lock may handle the run queue [ 9 ] . To
designate a task to run, the scheduler searches the run queue looking for the highest
dynamic priority of processes. That results in an O(n) time algorithm and causes the
scalability problem in SMP.
Linux 2.6 has been improved for SMP and has a per-CPU run queue, which
avoids the global spinlock with multiple CPUs and provides SMP scalability. The
scheduler on Linux 2.6, before 2.6.23, is called the O(1) scheduler [ 10 ] , which was
designed and implemented by Ingo Molnar.
The load balancer on Linux 2.6 supports SMP. Balancing within a schedul-
ing domain occurs among groups. The RP-1 Linux has one scheduling domain
with four groups, each of which consists of one CPU. The scheduler works
independently on each CPU. To maintain an equal load in multiple processors,
a load balancer is run periodically to equalize the workload among the proces-
sors. The four-core multiprocessor system has four schedulers. Each CPU has
a run queue. Each run queue maintains a variable called cpu_load, which represents
Search WWH ::




Custom Search