Java Reference
In-Depth Information
ects, it is possible to assign regis-
ters and position save-restore code in such a way that optimal register alloca-
tion is obtained [KF96]. The improvements in execution speed that result can
sometimes be dramatic.
Some architectures, most notably the Sparc, provide register windows .
When a call is made, the callee is provided a set of architected registers that
are physically distinct from the caller's architected registers. Each such set
of registers is termed a window into the relatively large number of available
physical registers. This reduces the cost of calls, as saving and restoring of
registers is done automatically. Register windows are allowed to overlap
partially to facilitate parameter-passing through registers. Some registers may
remain common across calls to facilitate access to global values.
When we account for interprocedural e
ff
13.4 Code Scheduling
We have already discussed the issues of instruction selection and register allo-
cation in code generation. Modern computer architectures have introduced a
new problem— code scheduling . Most modern computers utilize a pipelined
architecture . This means that instructions are processed in stages, with an
instruction progressing from stage to stage until it is completed. A number
of instructions can be in di
erent stages of execution at the same time. This
is very important since instruction execution overlaps, allowing much faster
execution speeds.
What happens if one instruction being executed needs a value produced
by an earlier instruction that has not yet completed execution? Normally this
is not a problem; pipelines are designed to make results available as soon as
possible. In a few cases however, a needed operand may not be available.
Then we have a stalled pipeline , delaying execution of an instruction (and its
successors) until the needed value is available.
Most current pipelined architectures are delayed load . This means that
a register value fetched by a load instruction is not available in the very next
cycle. Instead it is delayed for one or more execution cycles. For example, on a
MIPS R3000 processor, loads are delayed by one instruction. This delay allows
other instructions to be executed while the processor's cache is searched for
the fetched value. However, if the instruction immediately following the load
references the register, then the processor stalls while the cache is searched.
Thus the following instruction sequence, though valid, would stall:
ff
lw
$12, b
# Load b into register 12
add
$10, $11, $12
# Add reg 11 and reg 12 into reg 10
Stalls are not inevitable after a load.
If another instruction can be placed
 
 
Search WWH ::




Custom Search