Intermediate Forms of Source Programs (Compiler Writing) Part 2

ABSTRACT MACHINE CODE

Current emphasis in compiler design includes producing compilers that are both portable and adaptable. One approach that has been used to achieve portability and adaptability in compilers is to produce, as a form of intermediate source code, code for an abstract machine. The instruction set for this machine should closely model the constructs of the source languages that are to be compiled. This section begins by examining the notions of portability and adaptability. We also present the concept of an abstract machine. The section also contains a description of an abstract machine that has been created to facilitate the portability of PASCAL programs. A brief description of the intermediate form of source code (P-code) produced for this machine is given.

Portability and Abstract Machine

A program is said to be portable if it can be moved to another machine with relative ease. That is, the effort to move the program is substantially less than the initial effort to write it. On the other hand, a program is adaptable if it can be readily customized to meet several user and system requirements.

Suppose that we want to move a given compiler from, say, machine X to a different machine Y. To realize such a move, we must rewrite the code-generation routines of the existing compiler to produce code for machine Y. The rewriting task is much easier to accomplish if the original compiler has been divided into two parts with a well-defined interface. The first part (or front end) deals with the source language and the second (or back end) with the target machine. For a well-defined interface only the target-machine part need be changed. One form of interface is a symbolic program in an intermediate assembly system.


The flow of information between the two parts of a compiler takes the form of language constructs in one direction (front end to back end), and target-machine information in the other direction (back end to front end). The interface can be realized by using an abstract machine. The source-language constructs can be mapped into pseudo operations on this abstract machine. The abstract machine is designed for the particular source language, for example, a PASCAL machine.

The abstract machine is modeled on the basis of the operations and modes that are primitive in the source language (e.g., PASCAL). The first part of the compiler translates every primitive operation and primitive mode in a source program into abstract-machine instructions. The primitive operations of the abstract machine correspond to the simplest and most direct operations constituting a source program. Also, the primitive data modes of the abstract machine can be the simplest types in the language (e.g., INTEGER and CHAR in PASCAL), or more complex modes (e.g., structured types in PASCAL). A pair, consisting of a primitive mode and a primitive operation, describes an instruction on the abstract machine.

The abstract-machine architecture has an environment in which the modes and operations interact to model the language. The designer of an abstract machine can concentrate on a structure which facilitates the operations in the given source language. The designer must also, however, take into consideration the efficient implementation of the abstract machine on an actual computer.

One important advantage of using an abstract machine is the clean separation of the front end and back end of the compiler. This separation, in theory at least, means that moving a compiler to a new machine will require a new back end to the compiler with little, if any, change to its front end.

Suppose that it is required to implement m distinct languages on n different machines. Without using some form of intermediate source code, such as an abstract machine, m * n different compilers would have to be written, that is, one for each language/machine combination. In an abstract-machine approach, however, only m front ends and n back ends are required. A compiler for a certain programming language and target machine can then be generated by selecting the appropriate front end and back end. Using this approach, m * n different compilers can be generated from m + n components.

Today, it is quite feasible for a given programming language such as PASCAL to produce efficient object code for several different target machines. A more difficult problem is to have an abstract machine from which efficient object code can be produced for several programming languages. The difficulty is to have one abstract machine that efficiently models all of these programming languages. Since various programming language such as FORTRAN, LISP, BASIC, and PASCAL are so different, it is almost impossible to find an appropriate abstract machine. The pseudocode produced can often become real machine code for computers in which the pseudo machine code can be microprogrammed into the instruction set.

We have introduced the notion of an abstract machine in compiler writing in this section. The next section gives a description of an abstract machine for the PASCAL language.

The P-Code Abstract Machine for PASCAL

The Pascal-P compiler (Nori et al., 1981) is a portable compiler for essentially " standard PASCAL." The compiler produces object code (P-code) for an abstract machine. This section briefly describes this abstract machine.

The abstract machine called a P-machine is a simple machine-independent stack computer. The Pascal-P language is easily transported, at low cost, to a variety of real machines. "

The hypothetical stack machine has five registers and a memory. The registers are

1. PC—the program counter

2. NP—the new pointer

3. SP—the stack pointer

4. MP—the mark pointer

5. EP—die extremestack pointer

The last four pointers are associated with the management of storage in memory.

 Memory layout of stack computer

Figure 10-4 Memory layout of stack computer

The memory, which can be viewed as a linear array of words, is divided into two main parts. The first part of memory, CODE, contains the machine instructions. The second part of memory, STORE, contains the data (i.e., noninstruc-tions) part of a program. The layout of the computer’s memory is illustrated in Fig. 10-4. Observe that PC refers to the location of an instruction in CODE. On the other hand, the pointers EP, MP, NP, and SP refer to positions in STORE.

STORE contains two parts; the first part represents the various constants of a given program while the second part is dedicated to the other data requirements of the executing program.

The stack, whose top element is denoted by SP, contains all directly addressable data according to the data declarations in the program. The heap, with associated top element NP, consists of the data that have been created as a result of direct programmer control (i.e., new statements). The heap is similar to a second stack structure.

The stack consists of a sequence of data segments. Each data segment is associated with the activation of a procedure or a function.

Each data segment contains the following sequence of items:

1. A mark stack part

2. A (possibly empty) parameter section for the routine associated with the data segment

3. A local-data section for any local variables

4. Temporary elements that are required by the processing statements

The mark stack part has the following fields:

1. Value field for a function (present but not used in a procedure)

2. Static link

3. Dynamic link

4. Maximum stack size value

5. Return address

Observe that the static and dynamic links refer to STORE and the return address denotes a position in CODE.

The basic modes of PASCAL (e.g., INTEGER, REAL) are supported on the stack computer. The machine has several classes of instructions, for example, arithmetic (for integers and real), logical, and relational. Each instruction has an opcode and possibly two parameters. The second parameter denotes an address. Many of the instructions in the instruction repertoire are closely related to the PASCAL language. For example, there is an instruction for set, set intersection, set union, and set membership to support the operations on sets in PASCAL.

The Pascal-P compiler has been transported to several real machines. In fact, an implementation kit has been prepared to assist systems people to move the compiler.

P-code has been extended to U-code (Perkins and Sites, 1979) to facilitate certain kinds of code optimization. Although P-code is a good language for specifying a variety of optimization techniques, it appears that it is incomplete for performing good optimization.

Next post:

Previous post: