Application Programs and Systems - Heterogeneous Multicore Processor Technologies for Embedded Systems

Hardware Reference

In-Depth Information

the CPU's load. When run queues are initialized, their cpu_loads are set at zero

and updated periodically afterward. The number of runnable tasks on each run

queue is represented by the nr_running variable. The current run queue's cpu_

load variable is roughly set to the average of the current load and the previous

load using the statement shown below:

(

)

cpu _ load

=

cpu _ load

+

nr _ running

128 / 2.

*

The constant 128 is used to increase the resolution of load calculations and to

produce a fixed-point number. The above statement means that the cpu_load vari-

able accumulates the recent load history. The load balancing is done at a certain

appropriate timing. The load balancer looks for the busiest CPU. If the busiest

CPU is the current CPU, it does nothing because it is busy. If the load of the current

CPU is less than the average, and the difference in loads of two CPUs exceeds a

certain threshold, the current CPU will pull a certain number of tasks from the

busiest CPU. The number of tasks pulled is the smaller of the following two calcu-

lations. One is the difference between the busiest load and the average load of the

four CPU's, and the other is the difference between the average load of four CPU's

and the current load [ 11 ] .

The purpose of the first application program is to visualize the load balancing

mechanism of Linux. The application program shows that the number of processes

on each CPU core is averaged among the four CPU cores on the RP-1 chip.

6.3.1.2

Design and Implementation

When the application creates several processes, they will be distributed to the four

CPU cores according to the load balancing mechanism of the Linux kernel. This

mechanism should work effectively when the number of processes is both increasing

and decreasing.

A system diagram of the RP-1 application is shown in Fig. 6.21 , and the software

architecture of the RP-1 application is in Fig. 6.22 . The display unit (“DU” hereafter)

on the RP-1 chip has been used for visualization. The DU converts the contents of a

frame buffer located in the main memory into a video signal. The size of the display

is fixed to VGA, 640 × 480 pixels. The display is divided into four sections. They are

assigned to CPU #0, CPU #1, CPU #2, and CPU #3 exclusively, as shown in

Fig. 6.23 . The location of the frame buffer can be an arbitrary address. If the system

has a dedicated memory area for the frame buffer, the DU driver uses the virtual

address after mapping by the ioremap() function of Linux. In this system, the DU

driver allocates the frame buffer in the main memory, DRAM, using the dma_alloc_

coherent() function of Linux. This function allocates one or more physical pages

which can be written or read by the processor or device without worrying about

cache effects, and returns a virtual address. Finally, a frame buffer of plane 0 of the

DU can be accessed by a user program as a file, “/dev/fb0.”

The application program creates some processes. One process shows a bitmap

image of a penguin on the display. When a penguin process is assigned to CPU #3,

Heterogeneous Multicore Processor Technologies for Embedded Systems

Search WWH ::

Custom Search

Home