Application Programs and Systems - Heterogeneous Multicore Processor Technologies for Embedded Systems

Hardware Reference

In-Depth Information

The second application program on the RP-2 visualizes the power-saving mechanisms

for multiple cores using Linux. Two mechanisms are implemented in Linux. One

is dynamic voltage and frequency scaling (DVFS) of multiple cores, and the other

is dynamic plugging or unplugging of each CPU core. The two mechanisms

are controlled by the newly introduced “power control manager” daemon. The third

application program performs image processing of magnetic resonance imaging

(MRI) images using the RP-X chip. These three applications are described in

detail below.

6.3.1

Load Balancing on RP-1

6.3.1.1

Introduction

The RP-1 chip has four SH-4A cores. The main memory is shared by the four cores.

A pair of caches—an instruction cache and an operand cache—is placed between

each core and the main memory. Each operand cache is kept coherent with the other

operand caches using the directory-based write invalidation cache coherency protocol.

The write invalidation protocol is either the MESI cache coherency protocol or the

MSI cache coherency protocol. Eight channels of the inter-CPU interrupts (ICIs)

are implemented. The communication between cores inside Linux is mapped to one

or more channels of the ICIs. An interrupt caused by an event outside a core can

be either bound to a specific core or distributed to an arbitrary core so that the core

that receives an interrupt first serves the interrupt.

Some problems in scalability have been found with multiple processors in Linux

2.4. The problems have become obvious as multi-thread application programs have

become popular. Even on a single processor, the scheduler in Linux 2.4 runs in O(n)

time, where n is the size of the run queue. In symmetric multiprocessing (SMP) on

Linux 2.4, there is a single global run queue protected by a global spinlock. Only

one processor that has acquired the global lock may handle the run queue [ 9 ] . To

designate a task to run, the scheduler searches the run queue looking for the highest

dynamic priority of processes. That results in an O(n) time algorithm and causes the

scalability problem in SMP.

Linux 2.6 has been improved for SMP and has a per-CPU run queue, which

avoids the global spinlock with multiple CPUs and provides SMP scalability. The

scheduler on Linux 2.6, before 2.6.23, is called the O(1) scheduler [ 10 ] , which was

designed and implemented by Ingo Molnar.

The load balancer on Linux 2.6 supports SMP. Balancing within a schedul-

ing domain occurs among groups. The RP-1 Linux has one scheduling domain

with four groups, each of which consists of one CPU. The scheduler works

independently on each CPU. To maintain an equal load in multiple processors,

a load balancer is run periodically to equalize the workload among the proces-

sors. The four-core multiprocessor system has four schedulers. Each CPU has

a run queue. Each run queue maintains a variable called cpu_load, which represents

Search WWH ::

Custom Search

Home