Run-Time Storage Organization and Management (Compiler Writing) Part 1

Thus far we have concentrated on those parts of the compiler that are responsible for recognizing the syntactic correctness of a program. In the remainder of the text, the synthesis part of the compilation process is highlighted. Before we can begin to consider the generation of code for the constituent parts of a program (as recognized in the analysis phase), a clear understanding must be obtained of how the information in an executing program is to be stored and accessed. This topic is intended to provide this understanding. In particular, this topic provides a description of the static, dynamic, explicit, and implicit storage-allocation strategies that are most often adopted when compiling programs for the wide variety of languages that exist currently. The first section describes a rather simple storage-allocation strategy that is sufficient to handle statically defined data structures. Next dynamic storage allocation, which is commonly provided in block-oriented languages such as ALGOL, is discussed. Finally, the problems of managing a heap storage facility, which is frequently used in conjunction with explicit and implicit execution-time storage requests, are addressed.

STATIC STORAGE ALLOCATION

In a static storage-allocation strategy, it is necessary to be able to decide at compile time exactly where each data object will reside at run time. In order to make such a decision, at least two criteria must be met:

1. The size of each object must be known at compile time.


2. Only one occurrence of each object is allowable at a given moment during program execution.

Because of the first criterion, variable-length strings are disallowed, since their length cannot be established at compile time. Similarly dynamic arrays are disallowed, since their bounds are not known at compile time and hence the size of the data object is unknown.

Because of the second criterion, nested procedures are not possible in a static storage-allocation scheme. This is the case because it is not known at compile time which or how many nested procedures, and hence their local variables, will be active at execution time.

At compile time it is also impossible to determine how many times a recursive procedure is going to be invoked. Each invocation of a recursive procedure necessitates the creation of a new occurrence of each data object for that procedure. Therefore, by implication, the second criterion is violated and recursion cannot be accommodated in a static storage-allocation strategy.

It must be noted that FORTRAN does not provide variable-length strings, dynamic arrays, nested procedures, or recursive procedures. In fact FORTRAN typifies those languages in which a static storage-allocation policy is sufficient to handle the storage requirements of the data objects in a program.

A static storage-allocation strategy is very simple to implement. During an initial pass of the source text, a symbol-table entry is created for each variable and the set of attributes, as discussed in Chap. 8, is filled in. Included in these attributes is the object address. Because the precise amount of space required by each variable is known at compile time, the object address for a variable can be assigned according to the following simple scheme. The first variable is assigned some address A near the beginning of an allocated data area, the second variable is assigned address A + n1 assuming the first variable requires n} storage units (e.g., bytes), the third variable is assigned address A + nx + nz assuming the second variable requires n2 storage units, and so on. Figure 9-1 illustrates part of a symbol table that would be created for the given FORTRAN program segment assuming integer values require four storage units and real values require eight.

The data area that is required to hold the values of the program variables at run time can be allocated at compile time if desired. Generally this is unnecessary, since the data area can be established by a system utility (usually called the loader) prior to the beginning of program execution.

An object address can be either an absolute or a relative address. If the compiler is written for a single-job-at-a-time environment, the object address assigned is often an absolute address. The initial address A is set such that the program and data area reside in a section of memory separate from the resident parts of the operating system. If the compiler resides in a multiprogramming environment, the object address assigned is a relative address, that is, an address that is relative to the base or initial address of the data area that is allocated to a program. With a relative-addressing scheme, a program and its data area may reside at a different set of memory locations each time the program is executed. The loader accomplishes this relocation by reserving a set of memory locations for the program and setting a base register to the address of the first location in the data area.

(a) FORTRAN program segment (h) Associated symbol tabic

Figure 9-1 (a) FORTRAN program segment (h) Associated symbol tabic

A typical data area for a module of a program in which a static storage-allocation strategy is sufficient (e.g., a FORTRAN subroutine) is given in Fig. 9-2. We have adopted the notation of Gries (1971) and have divided the data area into three sections, one for the implicit parameters, one for the actual parameters, and one for the program variables. An implicit parameter is primarily used for communication with the calling module. Typically such a parameter is the return address to the calling procedure, or the return value of a functional procedure, when it is not convenient to return this value in a register. An actual parameter contains the value or address of the value of an argument that is designated in a call to the module. More will be said about parameter-passing schemes in Chap. 11. Finally, the program variables’ section contains the storage space for the simple variables, aggregates (i.e., arrays and records), compiler-generated temporary variables, etc.

Implicit parameters

Actual parameters

Simple variables, aggregates, and temporaries

Figure 9-2 A typical data area for a static storage allocation strategy.

A call to a procedure consists of the following steps:

1. Bring forward the values or evaluate the addresses of the actual parameters (i.e., arguments from the calling procedure) and store them in a list in the calling procedure’s data area.

2. Place the address of the parameter list in a register.

3. Branch to the procedure.

Prior to the execution of the procedure both the implicit and explicit parameters must be moved into the special locations that have been previously reserved in the data area. When returning to the calling procedure, the implicit parameters are loaded into registers and a jump back to the calling procedure occurs as dictated by the return address.

Let us turn our attention to the more complex dynamic storage-allocation strategy.

DYNAMIC STORAGE ALLOCATION

In a dynamic storage-allocation strategy, the data area requirements for a program are not known entirely at compilation time. In particular, the two criteria that were given in the previous section as necessary for static storage allocation do not apply for a dynamic storage-allocation scheme. The size and number of each object need not be known at compile time; however, they must be known at run time when a block is entered. Similarly more than one occurrence of a data object is allowed, provided that each new occurrence is initiated at run time when a block is entered.

As is indicated by the previous discussion, a dynamic storage-allocation strategy is very much a block-oriented strategy and hence is used in block-structured languages such as ALGOL and PL/I. Because of the nested properties of the blocks, a dynamic storage-allocation strategy can be modeled fairly simply using a stack like data area commonly called the run-time stack. Each block within the program can be viewed as having its own data area. When the block is invoked at run time, space for its data area is requested from the entire data area allotted to the program (i.e., from the run-time stack). This space is reserved until the entire block has been executed, at which time it is released (i.e., popped from the run-time stack). Note that between the invocation of a block and its eventual termination, several other blocks may be invoked through procedural calls or "BEGIN" block entries. The data areas for these blocks are simply pushed onto and popped off the stack in the fashion just described. Whenever execution returns to a block from which a call is initiated, the contents of the run-time stack should be the same as they were immediately prior to the call.

The action of this dynamic storage-allocation model can best be illustrated by an example. Figure 9-3, a reproduction of Fig. 8-25, illustrates a skeleton of a program segment from a nested language. Figure 9-4 provides a trace of the contents of the run-time stack as this module is executed.

 A program segment from a nested language

Figure 9-3 A program segment from a nested language

Trace of the run-time stack.

Figure 9-4 Trace of the run-time stack.

A number of important points are exemplified in the trace of Fig. 9-4. First, it is clear that the run-time stack does behave as a stack. When a block, say block /, is entered, a special data area is created on top of the run-time stack. This data area is commonly called an activation record, and it contains, among other things, the memory space necessary to hold the values of the variables local to that block. When execution of the i th block is ended (i.e., a block exit occurs), the corresponding activation record, denoted as AR(, is removed from the top of the run-time stack, since the variables defined in the /th block cannot be accessed outside of the block.

Activation Records

A brief scenario clarifying the trace of Fig. 9-4 follows. This scenario is intended to illustrate the role of an activation record in the execution of a program that uses dynamic storage allocation.

Local data

area

Parameter

area

Display

area

Figure 9-5 Layout of a typical activation record.

The program module is entered and an activation record for the first BBLOCK is created on the run-time stack as shown in Fig. 9-4a. Within the BBLOCK, a call is made to the PBLOCK named Ml. An activation record for Ml, designated as AR2, is then stacked as shown in Fig. 9-4b. During the execution of Ml, the PBLOCK named M2 is invoked, and this causes the stacking of the activation record AR3, as illustrated in Fig. 9-4c. Assuming the BBLOCK within M2 is entered, a new activation record AR4 is stacked as depicted in Fig. 9-4d.

Upon completion of the BBLOCK numbered 4 in Fig. 9-3, activation record AR4 is popped from the run-time stack, since the variables for that block are no longer accessible. As the run-time activity progresses and execution is terminated for the remaining blocks, the other activation records AR3, AR2, and ARt are unstacked in the same manner as AR4.

In addition to providing space for the storage of local variables, an activation record contains a parameter area and a display area. This is illustrated in Fig. 9-5, which depicts the layout of a typical activation record. In the next two subsections, each of these two areas is examined in detail.

Parameter Area

The parameter area holds both implicit and explicit parameters. Implicit parameters may include a return address, a pointer to the base of the previous activation record, and a return value. The return address is the address corresponding to the point in the program at which execution commences upon completion of the execution of the current block. A return address is not required for BEGIN-type blocks; however, space is sometimes allotted to standardize the form of the parameter area. The previous-activation-base pointer contains the base location of the activation record for the block from which control, say through a procedure call, was just transferred. This information must be maintained to ensure that the eventual return of control to the invoking procedure results in the restoration of the run-time environment as it was before the call was made. Finally, the return value for a functional procedure is sometimes included in the implicit parameter area. There are, however, other possibly better ways of handling return values, and these will be outlined in Sec. 11-6. In the remaining discussion in this section, a return value will not be included as an implicit parameter.

The set of explicit parameters (sometimes called the arguments of a procedure call) forms the other component of the parameter area. A variety of parameter- passing schemes exist, and some of these will be discussed in Sec. 11-6. In the detailed example given later in this subsection, a call-by-value parameter-passing scheme will be assumed. In such a scheme the values of the arguments (sometimes called actual parameters) are assigned to the formal parameters. The formal parameters are those parameters that are defined locally in the called procedure. Because the formal parameter values are identical to the values of the arguments, the formal parameters are usually considered to be situated in the parameter area in a call-by-value scheme.

Display Area

The final area that may be present in an activation record is the display area. This area contains the information necessary to access variables that are global to the block that is currently being executed. It is composed of a series of pointers, each pointer pointing to the base of an activation record of a block that is global to the block currently being executed. Referring to the program segment in Fig. 9-3, for example, it can be seen that when the block labeled 4 is executed, the variables declared in blocks 1 and 3 are accessible on a global basis. Therefore, the display area created when block 4 is activated should contain two pointer entries: one pointing at the base of the activation record for block 1 and the other pointing at the base of the activation record for block 3.

As mentioned in Sec. 8-3, a 2-tuple addressing scheme containing a block-level (BL) component and occurrence-number (ON) component is commonly adopted for block-structured languages. For example, in the procedure block Ml in Fig. 9-3, the variables IND and X would be assigned tuple addresses of (2.0), and (2,1), respectively. The block level of 2 is assigned because Ml is nested to depth 2. The occurrence numbers are assigned to the variables in order starting from 0 (hence in the example the values 0 and 1 are assigned).

The tuple address is used in conjunction with the display area to locate the value of a variable in the run-time stack. If the variable being referenced is a local variable, the location of the variable can be found at ON locations after the location of the first explicit parameter in the activation record. If the variable being referenced is a global variable, its BL component must be less than the block level of the block being executed currently. If this is the case, the base of the activation record which contains the location of the global variable can be referenced indirectly through the appropriate pointer in the display area of the current activation record. For example, if the variable X declared in the outermost block in Fig. 9-3 is referenced inside of the block numbered 4, its block level (i.e., 1) will be less than the current block level (i.e., 3). Its location can be found by referring to the BL = 1 or first element of the display area and using this activation-base-pointer value plus the occurrence number of 0 to locate the variable X in the activation record for the outermost block.

A trace of the run-tune stack during part of the execution of the program segment in Fig. 9-3 is given in Fig. 9-6. In this trace, the display and the implicit parameters (i.e., the return address and previous-activation-base pointer) are shown.

Detailed trace of the run-time stack

Figure 9-6 Detailed trace of the run-time stack

The trace proceeds only far enough to show the state of the run-time stack when execution begins on the BBLOCK numbered 4.

Execution begins with the BBLOCK numbered 1. Note that the activation record for this block does not contain a display, since there are no variables global to this block, or a parameter area, since this block was not invoked from another block. In practice there may in fact be a parameter area which contain*, a return address to some part of the operating system. Also, external parameters may be passed to the program through the parameter area.

When the procedure block Ml is invoked, a single-element display is constructed as depicted in Fig. 9-6b. The pointer abp(l), which is called an activation-base pointer, points to the base of the latest activation record for the block numbered 1. This display element facilitates the referencing of the global variables X, Y, or NAME. The return address, designated as "ret addr(l)," denotes the address to which control returns after Ml has been executed. The previous activation-base pointer (prev abp) is stacked in the parameter area along with the value that is assigned to the formal parameter IND.

Note that when the procedure block M2 is called, a single-element display is again created as shown in Fig. 9-6c. It is identical to the previous display, as it should be, since the variables that are global to procedure block M2 are the same as those for procedure block Ml, namely, those found in the block numbered 1. A more formal set of rules for how to construct a display will be given immediately after a discussion of the trace.

When the basic block numbered 4 is entered in the procedure block M2, the display that is created contains pointers to the base of the activation record for each of the blocks number 1 and 3. Therefore, the variables X, Y, NAME, and J, which are all variables global to the basic block numbered, are accessible through the two-element display.

After the basic block numbered 4 has been executed, control is returned to the procedure block M2. At this time, the activation record for the basic block must be logically removed from the run-time stack so as to create a run-time environment identical to what existed prior to the execution of the basic block. To achieve this recreated environment, the activation-base pointer is reset to the value of the "prev abp" so that the activation-base pointer ("abp") will be pointing at the base of the activation record AR3 for the procedure block M2. This process of resetting the activation-base pointer must take place each time the execution of a block is terminated and control returns to a block that was previously active.

The rules for constructing a display area can be summarized as follows. If a block at level j is entered (e.g., called) from a block at level then

1. If _/’ = /’ + 1 (i.e., entering a BEGIN-type of block or calling a procedure block that is declared locally to the current block), then recopy the display for the block at level / and append to this a pointer pointing to the base of the activation record for the block at level i (see Fig. 9-7a).

2. If j < i (i.e., calling a procedure block that is declared globally to the current block), then the first j — 1 entries from the display area in the activation record foi the block at level i form the display area for the block at level j (see Fig. 9-lb).

It should be noted that a display area need not be created for every activation record. If memory is a constraint, it may be better to have only a single display, namely, a display for the current block, created at any one time. Upon exiting a block, a history stack must then be used to help recreate the display the way it was prior to the execution of a block.

Illustration of two rules for constructing a display.

Figure 9-7 Illustration of two rules for constructing a display.

This recreation process is particularly a problem when a return is made to a block at level j from a block at level i when j is less than or equal to i (i.e., case b given earlier). A discussion of this approach can be found in McKeeman et al. (1970).

A third and very storage-efficient approach to the creation of a display area for a given block is to store in the activation record a pointer to the base of the activation record for the block that most closely surrounds the given block in the program. In this way, references to global variables can be resolved by descending a chain of activation-base pointers until the appropriate activation record is located. Although this method is storage-efficient, it is not time-efficient because we must travel along a number of pointers to get to the appropriate activation record.

Next post:

Previous post: