Target Code Generation - Crafting a Compiler

Java Reference

In-Depth Information

13.5 Automatic Instruction Selection

An important aspect of code generation is instruction selection .Afteratrans-

lation for a particular construct is determined, the machine-level instructions

that implement the translation must be chosen. Thus, we may decide to im-

plement a switch statement using a jump table. If so, instructions that index

into the jump table and execute an indirect jump must be generated.

Often several di

erent instruction sequences can implement a particular

translation. Even something as simple as a+1 can be implemented by load-

ing 1 into a register and generating an add instruction, or by generating an

increment or add-immediate instruction. We normally want the smallest or

fastest instruction sequence. Thus, an add-immediate instruction is preferred

because it avoids an explicit load.

In simple RISC architectures, the choice of potential instruction sequences

is limited because almost all operands must be loaded into registers before

they can be used (immediate operands being a notable exception). Further,

the variety of addressing modes provided is also spartan; often only absolute

and indexed addresses are allowed.

Older architectures, like the Motorola 680x0 and Intel x86, are much more

elaborate. Many di

ff

erent operation codes are provided, and a wide variety of

addressing modes are available. Operands do not always need to be loaded

into registers; addressing modes can fetch operands indirectly and can incre-

ment and decrement registers. Di

ff

erent register classes (e.g., address registers

and data registers) are used in di

erent instructions (in a non-interchangeable

manner) and particular registers are sometimes “wired into” certain instruc-

tions. The JVM also has a relatively elaborate instruction set because its design

was based on achieving compact code sequences. As a result, there are several

ways to increment the contents of a register, but one sequence of bytecode

instructions achieves that e

ff

ect most compactly.

For very complex architectures, amethod of systematizing and automating

instruction selection is vital. Even for simpler architectures, itmay be necessary

to “extend” a code generator when a successor architecture is introduced. Very

ambitious compilersmay aimto compile intomore than one target architecture,

mandating alternative instruction sequences for di

ff

erent target machines.

Instruction selection is often simplified by translating source language con-

structs into a very low-level tree-structured intermediate representation (IR).

In this IR, leaves represent registers, memory locations, or literal values,

and internal nodes represent basic operations on operand values. Detailed

data access patterns and manipulations are exposed. Consider the statement

b[i]=a+1,whereb is an array of integers, i is a global integer variable, and a is

a local variable accessed through the frame register, $fp. The statement's tree-

structured IR is shown in Figure 13.26. This representation is very similar to

ff

Crafting a Compiler

Search WWH ::

Custom Search

Home