Energy Efficient Array Initialization Using Loop Unrolling with Partial Gray Code Sequence - VLSI Design and Test - page 90

Information Technology Reference

In-Depth Information

Tabl e 1. Comparision between S LU and S LUG

uf

uf S LU S LUG Gain(%)

2 2 2 8 2036 1525 25.09

2 3 2 7 2036 1270 37.62

2 4 2 6 2036 1143 43.86

2 5 2 5 2036 1080 46.95

2 6 2 4 2036 1049 48.47

2 10 1 2036 1023 49.75

n

2.4 Translation to LUG

In section 2.2 and 2.3 the expressions for S LU and S LUG have been derived, re-

spectively, assuming 0 as the base address ( a ). But, in reality when the program

in Fig. 2 will execute, the base address ( a )maynotbe 0 .The base address ( a )

may vary for different executions because it depends on system's memory man-

ager that allocates space for array a at runtime. So, it is not possible for a

compiler to predict the actual base address base address ( a ). The present work

considers both b and n are divisible by uf . When the array a is allocated at com-

pile time the compiler does not know the actual base address base address ( a ),

but knows the relocatable base address of the array, which is an offset address.

The compiler finds a relocatable base address such that the logic values cor-

responding to the intra-iteration switching bits are 0, which implies that b is

divisible by uf . If the array a is allocated in runtime then the dynamic memory

allocation subroutine can be directed to find a base address such that b divisible

by uf .

3 Experimental Results

The present work is evaluated on five benchmark programs on XEEMU sim-

ulator [12]. XEEMU is a power-performance simulator which simulates Intel's

XScale processor. Each benchmark program (as described in Table 3) have array

initialization loops (as in Fig. 2(a)) which are translated to LUG (as in Fig. 2(c)).

Table 2 shows the reduction in switching activity, execution time, energy con-

sumption by the translated loop ( E TL ) and energy drawn by the address bus

of dl1-cache ( E dl 1 −addr bus ) for the programs in Fig. 1. Since E dl 1 −addr bus is di-

rectly propotional to S LUG they experience equal amount of reduction. Table 4

shows the time taken and energy consumed by the benchmark programs having

the original loop ( Org ), LU ,and LUG . SCount and CSort with LUG achieves

more gain in total energy ( E Tot ) because their array initialization time ( T init )

is much longer than computation time ( T comp ). KS , TI and DFS with LUG

have less gain in E Tot because their T init is much lesser than T comp .Thus, LUG

is more applicable for the programs having T init ≥

T comp .

Next Page

VLSI Design and Test

Search WWH ::

Custom Search

Home