Target Code Generation - Crafting a Compiler

Java Reference

In-Depth Information

1. lw

$10,a

6.

add

$10,$13,$12

2. lw

$11,b

7.

mul

$11,$11,$10

3. lw

$12,d

8.

mul

$12,$10,$12

4. lw

$13,c

9.

add

$12,$11,$12

5. mul $11,$10,$11

10. sw

$12,a

Figure 13.25: Delay-free MIPS code for a=((a*b)*(c+d))+(d*(c+d))

instruction 4 uses a previous value of $10, loaded in instruction 1. If we add an

additional register, $13, to our allocation, we can load it in instruction 5 (taking

care to reference $13 rather than $10 in instruction 6). Now instruction 5 can

be moved earlier in the sequence, avoiding a stall. The resulting delay-free

code is shown in Figure 13.25.

It is evident that there is a tension between code scheduling, which tries

to increase the number of registers used (to avoid stalls), and code genera-

tion, which seeks to reduce the number of registers used (to avoid spills and

make registers available for other purposes). An alternative to postpass code

scheduling is an integrated approach that intermixes register allocation and

code scheduling.

The Goodman-Hsu [GH88] algorithm is a well-known integrated register

allocator and code scheduler . As long as registers are available, it uses them

to improve code scheduling by loading needed values into distinct registers.

This allows loads to “float” to the beginning of the code sequence, eliminating

stalls in later instructions that use the loaded values. When registers grow

scarce, the algorithm switches emphasis and begins to schedule code to free

registers. When su

cient registers are available, it resumes scheduling to

avoid stalls. Experience has shown that this approach balances nicely the

need to use registers sparingly and yet avoid stalls whenever possible.

13.4.2 Global and Dynamic Code Scheduling

Although we have focused on code scheduling at the basic block level, re-

searchers have also studied global code scheduling [BR91]. Instructions may

be moved upward, past the beginning of a basic block, to predecessor blocks in

the control flow graph. We may need to move instructions out of a basic block

because basic blocks are often very small, sometimes only an instruction or two

in size. Moreover, certain instructions, like loads and floating point multiplies

and divides, can incur long latencies. For example, a load that misses in the

primary cache may stall for 10 or more cycles; a miss in the secondary cache

(to main memory) can cost 100 or more cycles.

Crafting a Compiler

Search WWH ::

Custom Search

Home