Target Code Generation - Crafting a Compiler

Java Reference

In-Depth Information

ects, it is possible to assign regis-

ters and position save-restore code in such a way that optimal register alloca-

tion is obtained [KF96]. The improvements in execution speed that result can

sometimes be dramatic.

Some architectures, most notably the Sparc, provide register windows .

When a call is made, the callee is provided a set of architected registers that

are physically distinct from the caller's architected registers. Each such set

of registers is termed a window into the relatively large number of available

physical registers. This reduces the cost of calls, as saving and restoring of

registers is done automatically. Register windows are allowed to overlap

partially to facilitate parameter-passing through registers. Some registers may

remain common across calls to facilitate access to global values.

When we account for interprocedural e

ff

13.4 Code Scheduling

We have already discussed the issues of instruction selection and register allo-

cation in code generation. Modern computer architectures have introduced a

new problem— code scheduling . Most modern computers utilize a pipelined

architecture . This means that instructions are processed in stages, with an

instruction progressing from stage to stage until it is completed. A number

of instructions can be in di

erent stages of execution at the same time. This

is very important since instruction execution overlaps, allowing much faster

execution speeds.

What happens if one instruction being executed needs a value produced

by an earlier instruction that has not yet completed execution? Normally this

is not a problem; pipelines are designed to make results available as soon as

possible. In a few cases however, a needed operand may not be available.

Then we have a stalled pipeline , delaying execution of an instruction (and its

successors) until the needed value is available.

Most current pipelined architectures are delayed load . This means that

a register value fetched by a load instruction is not available in the very next

cycle. Instead it is delayed for one or more execution cycles. For example, on a

MIPS R3000 processor, loads are delayed by one instruction. This delay allows

other instructions to be executed while the processor's cache is searched for

the fetched value. However, if the instruction immediately following the load

references the register, then the processor stalls while the cache is searched.

Thus the following instruction sequence, though valid, would stall:

ff

lw

$12, b

# Load b into register 12

add

$10, $11, $12

# Add reg 11 and reg 12 into reg 10

Stalls are not inevitable after a load.

If another instruction can be placed

Crafting a Compiler

Search WWH ::

Custom Search

Home