Information Technology Reference
In-Depth Information
Tabl e 1.
Comparision between
S
LU
and
S
LUG
uf
uf
S
LU
S
LUG
Gain(%)
2
2
2
8
2036 1525 25.09
2
3
2
7
2036 1270 37.62
2
4
2
6
2036 1143 43.86
2
5
2
5
2036 1080 46.95
2
6
2
4
2036 1049 48.47
2
10
1 2036 1023 49.75
n
2.4 Translation to
LUG
In section 2.2 and 2.3 the expressions for
S
LU
and
S
LUG
have been derived, re-
spectively, assuming
0
as the
base address
(
a
). But, in reality when the program
in Fig. 2 will execute, the
base address
(
a
)maynotbe
0
.The
base address
(
a
)
may vary for different executions because it depends on system's memory man-
ager that allocates space for array
a
at runtime. So, it is not possible for a
compiler to predict the actual base address
base address
(
a
). The present work
considers both
b
and
n
are divisible by
uf
. When the array
a
is allocated at com-
pile time the compiler does not know the actual base address
base address
(
a
),
but knows the relocatable base address of the array, which is an offset address.
The compiler finds a relocatable base address such that the logic values cor-
responding to the intra-iteration switching bits are 0, which implies that
b
is
divisible by
uf
. If the array
a
is allocated in runtime then the dynamic memory
allocation subroutine can be directed to find a base address such that
b
divisible
by
uf
.
3 Experimental Results
The present work is evaluated on five benchmark programs on XEEMU sim-
ulator [12]. XEEMU is a power-performance simulator which simulates Intel's
XScale processor. Each benchmark program (as described in Table 3) have array
initialization loops (as in Fig. 2(a)) which are translated to
LUG
(as in Fig. 2(c)).
Table 2 shows the reduction in switching activity, execution time, energy con-
sumption by the translated loop (
E
TL
) and energy drawn by the address bus
of dl1-cache (
E
dl
1
−addr bus
) for the programs in Fig. 1. Since
E
dl
1
−addr bus
is di-
rectly propotional to
S
LUG
they experience equal amount of reduction. Table 4
shows the time taken and energy consumed by the benchmark programs having
the original loop (
Org
),
LU
,and
LUG
.
SCount
and
CSort
with
LUG
achieves
more gain in total energy (
E
Tot
) because their array initialization time (
T
init
)
is much longer than computation time (
T
comp
).
KS
,
TI
and
DFS
with
LUG
have less gain in
E
Tot
because their
T
init
is much lesser than
T
comp
.Thus,
LUG
is more applicable for the programs having
T
init
≥
T
comp
.