Information Technology Reference
In-Depth Information
Tabl e 1. Comparision between S LU and S LUG
uf
uf S LU S LUG Gain(%)
2 2 2 8 2036 1525 25.09
2 3 2 7 2036 1270 37.62
2 4 2 6 2036 1143 43.86
2 5 2 5 2036 1080 46.95
2 6 2 4 2036 1049 48.47
2 10 1 2036 1023 49.75
n
2.4 Translation to LUG
In section 2.2 and 2.3 the expressions for S LU and S LUG have been derived, re-
spectively, assuming 0 as the base address ( a ). But, in reality when the program
in Fig. 2 will execute, the base address ( a )maynotbe 0 .The base address ( a )
may vary for different executions because it depends on system's memory man-
ager that allocates space for array a at runtime. So, it is not possible for a
compiler to predict the actual base address base address ( a ). The present work
considers both b and n are divisible by uf . When the array a is allocated at com-
pile time the compiler does not know the actual base address base address ( a ),
but knows the relocatable base address of the array, which is an offset address.
The compiler finds a relocatable base address such that the logic values cor-
responding to the intra-iteration switching bits are 0, which implies that b is
divisible by uf . If the array a is allocated in runtime then the dynamic memory
allocation subroutine can be directed to find a base address such that b divisible
by uf .
3 Experimental Results
The present work is evaluated on five benchmark programs on XEEMU sim-
ulator [12]. XEEMU is a power-performance simulator which simulates Intel's
XScale processor. Each benchmark program (as described in Table 3) have array
initialization loops (as in Fig. 2(a)) which are translated to LUG (as in Fig. 2(c)).
Table 2 shows the reduction in switching activity, execution time, energy con-
sumption by the translated loop ( E TL ) and energy drawn by the address bus
of dl1-cache ( E dl 1 −addr bus ) for the programs in Fig. 1. Since E dl 1 −addr bus is di-
rectly propotional to S LUG they experience equal amount of reduction. Table 4
shows the time taken and energy consumed by the benchmark programs having
the original loop ( Org ), LU ,and LUG . SCount and CSort with LUG achieves
more gain in total energy ( E Tot ) because their array initialization time ( T init )
is much longer than computation time ( T comp ). KS , TI and DFS with LUG
have less gain in E Tot because their T init is much lesser than T comp .Thus, LUG
is more applicable for the programs having T init
T comp .
 
Search WWH ::




Custom Search