Information Technology Reference
In-Depth Information
Front End
Exter nal (Main Mem ory)
L1-I Cache
Main
Memory
Parameter
Value(s)
Domain Voltage
0.65 V — 1.20 V
Domain Frequency
250 MHz — 1.0 GHz
Fetch Unit
Frequency Change Rate
49.1 ns/MHz
Domain Clock Jitter
110ps, normally distributed about zero
Load/Store
Synchronization window
30% of 1.0 GHz clock (200ps)
ROB, Rename,
Dispatch
L2 Cache
I
r
F
l
t
P
Integer Issue Queue
FP Issue Queue
Load/Store Queue
FP ALUs &
Register File
Integer ALUs &
Register File
L1-D Cache
FIGURE 3.6: MCD processor and clock parameters. Adapted from [ 199 ].
data dependencies into a temporal-ordered directed acyclic graph (DAG). A DAG is created
for an interval of 50K instructions and is then processed in two phases.
In the first phase, each event in the DAG that is not on the critical path is stretched,
as if each instruction could run at its own frequency. A multi-pass “shaker” algorithm tries to
distribute slack evenly in the DAG wherever it exists. This step concludes when all slack in the
DAG is removed and each instruction is assigned to run at one of the allowed frequencies (e.g.,
one of the 32 frequencies for the Transmeta Crusoe or one of the 320 in the Intel XScale).
Since executing each instruction at a different frequency is not practical, the second phase
processes the results of the first phase and aims to find a single minimum frequency per interval
for each domain . This is done under the constraint that each domain finishes its work with
no more than a fixed—externally set—factor of time dilation. Finally, intervals with the same
or similar frequencies are merged together to create larger combined intervals—and this is
continued recursively—with the intent of reducing the number of reconfigurations. In contrast
to the first phase where DVFS reconfiguration was considered instantaneous and for-free, in
the second phase realistic DVFS overhead (for the two processors studied) is taken into account.
Online approach : By analyzing the resource utilization of various CPU structures in [ 199 ],
Semeraro et al. discovered that there is a significant correlation between the number of valid
entries in the input queues for each domain and the desired frequency of the domain as derived
by their offline method (see above). In other words, the occupancy of these queues reveals the
Search WWH ::




Custom Search