Programming-Language Design (Compiler Writing) Part 4

Control Structure

There has been a great deal of controversy regarding the best choice for control-structure primitives. Much of the discussion centers around the inclusion or exclusion of the GOTO. It is almost universally agreed that it is better to use high-level control constructs rather than the unrestricted GOTO. It is equally widely accepted that a suitable set of high-level constructs would eliminate both the necessity of and the desire for the GOTO. What is not so well agreed upon is what should comprise such a set. It is obviously necessary to be able to group several statements for execution in sequence. There is little dispute that the IF statement and some sort of loop terminating on a Boolean condition are desirable. Almost as indisputable are some form of CASE statement for multiway choices and some form of FOR loop controlling an index variable. The major area of dispute is centered around "escape" constructs and constructs for the handling of exceptional conditions which often, but not always, are one and the same.

Clearly, this discussion has some elements of the minimal versus minimal-useful issue mentioned in the consideration of design philosophy in Sec. 3-4. No one disputes that a program which uses escape constructs can also be written without them. The basic question is whether it is significantly easier to read and to write programs with such constructs available—more research is required in this area. The rest of this subsection will discuss various constructs in detail.


The simplest control structure is the combination of several statements into a single statement, as with ALGOL’S BEGIN-END statement. Languages without such a construct (FORTRAN, BASIC) suffer badly from its absence. Given its desirability, there are three distinct methods for obtaining such a combination’: explicit brackets, self-bracketing constructs, and bracketing by indentation.

All explicit-bracketing schemes unfortunately have one defect. They treat the cases "one statement" and "several statements" differently (a violation of "uniformity"). It is often necessary, for debugging or modification, to put several statements where before only one was present. Any explicit-bracketing scheme requires brackets to be inserted whenever this is done, which is a major annoyance.

This difficulty is solved by the self-bracketing-constructs approach, in which any control construct which may have to contain some statements is organized so as to have distinctive punctuation bracketing the point where the statements may occur. For example, one may add a terminating keyword to an IF to make it self-bracketing:

IF expression THEN statements FI

or

IF expression THEN statements ELSE statements FI

It is comparatively easy to make the full set of control constructs self-bracketing, thereby eliminating explicit bracketing. The different keywords terminating each self-bracketing construct also add valuable redundancy. Self-bracketing con-

THE THEORY AND PRACTICE OF COMPILER WRITING

The final possibility, bracketing by indentation, uses the two-dimensional nature of the program to make the compiler aware that the ELSE in belongs to the outer IF rather than to the inner one.

tmp4C8-9_thumb[2][2][2]

This approach, which has been around for quite some time, appears interesting but contains some possible pitfalls. The loss of the terminating keywords is a significant loss of redundancy. This may cause trouble if a new line with an incorrect indentation level is added or if revisions to a section of a program require widespread changes in indentation level. Also, indentation is not generally significant in natural languages; programmers are not accustomed to considering it as significant.

Whenever statements are grouped, it is obviously necessary to separate them from each other. The almost universally accepted convention is to use the semicolon ";". One minor issue of debate is whether the semicolon should separate statements ("BEGIN . . . ; . . . ; … END") or terminate them ("BEGIN …; …; …; END"). Using it as a separator is perhaps more elegant, and can sometimes be easier to parse, but actual measurements (Gannon, 1975) show that the terminating-semicolon form is far less error-prone.

The next most fundamental control construct is the IF statement, which provides a two-way choice of action. In such a choice, it is quite possible that one of the two actions is in fact null, thus resulting in the familiar

tmp4C8-10_thumb[2][2][2]

(the FI together with the THEN serves to bracket the statement sequence). However, the second action is not always null, which leads to the equally familiar

tmp4C8-11_thumb[2][2][2]

which provides for both actions. Measurements by Alexander (1972) on the relative frequency of these two forms reveals that roughly 35 percent of IFs have an ELSE. This number is too small to justify requiring the ELSE; so the IF should be accepted as having both forms.

One issue winch is relevant to several constructs, including the IF, is the question "what is a Boolean expression?" To date, this question has been answered in at least three different ways:

1. "Boolean" is an independent data type, with constants "TRUE" and "FALSE." Certain operators, such as the comparison operators, yield Boolean results.

2. "Boolean" is the same as "integer." True and false are nonzero and zero, respectively. Comparison operators yield 1 or 0.

3. "Boolean" is the same as "integer." True and false are odd and even, respectively (i.e., distinguished by the low-order bit of the integer). Comparison operators yield 1 or 0.

Of these three, the first is preferable. The difficulty with the second is that "clever" programmers often make use of their knowledge of how the Boolean expression is evaluated to write "IF x" where they should write "IF x = 1." The resulting loss in readability is sizable. Similar objections apply to the third alternative. Another problem with the second and third methods is that the confusion of "Boolean" and "integer" interferes with error checking.

A natural generalization of the IF is from a two-way choice to an n-way choice. Often this choice is made on the basis of the value of an integer variable. Several forms of the CASE construct have been proposed, and several issues are of importance.

First, many older CASE constructs rely on the textual order of the alternatives in order to decide which case corresponds to which alternative:

tmp4C8-12_thumb[2][2][2]

(Note that "OR" is used as a delimiter, not as the "or" operator.) Textual order unfortunately opens up possibilities for error if one case is forgotten or misplaced. A much better scheme is one in which each case is labeled with the value(s) that cause it to be chosen:

tmp4C8-13_thumb[2][2][2]

It is usually desirable to specify the range of possible choice values, partly for easier implementation and partly to provide valuable redundancy. It definitely is useful to be able to specify the same action for more than one value and to be able to give a range of values instead of writing them all out. For the sake of readability it is highly desirable that the values used in the labels be restricted to simple constants.

Note also that with this scheme there really is no reason to restrict the choice-expression type to,be "integer"; it would be equally valid to make a choice on the basis of a value of "character" or enumeration type, for example. Efficient implementation should decide what limitations must be placed on the type of the choice expressioh.

Another issue which is relevant is the question of the precise implementation of such a CASF. At least two alternatives suggest themselves: either use the value of the choice expression to index into a table of action addresses, or evaluate the choice expression, store its value, and compare it successively with each value label. Each scheme has its advantages. Indexing is faster, whereas (if ranges are allowed as labels) sequential checking can reduce storage requirements noticeably. It may be worthwhile to give the programmer a choice or allow the compiler to choose based on an optimization formula.

What if the value of the choice expression does not correspond to any of the value labels?

This definitely should not cause a random jump to somewhere in the code! The usual strategy here is to include an ELSE clause which is executed if none of the alternatives are selected:

tmp4C8-14_thumb[2][2][2]

It has also been suggested (Weinberg et al., 1975), with some justification, that it would be useful to have a clause (perhaps an "ALSO" clause) containing statements that would be executed if any of the labeled alternatives was executed. For example, this would provide a simple way of handling matters such as setting an "action-taken" flag or disposing of a successfully processed data item.

It is sometimes suggested that the syntax for the IF should be merged with that of the CASE, since the IF is really just a special case of CASE. A solid counterargument, however, is that the overwhelming majority of the choice constructs used are IFs (Alexander, 1972), and that the popularity of the IF (particularly with no ELSE clause) justifies special syntax to improve readability.

A more generalized CASE construct that has been suggested by Dijkstra (1975) and Weinberg et al. (1975) is of the form

tmp4C8-15_thumb[2][2][2]

In this form, each of the "Booleans" is evaluated; when one of them is true, the corresponding "statements" section is executed. It is not clear what ought to happen should two or more of the "Booleans" both happen to be true; possibly the first one to be true should be used. This may provide a more generally useful form of the sequential checking mentioned earlier as a plausible implementation of the usual CASE. This form appears potentially quite useful, especially if equipped with an ELSE and an ALSO clause. The Weinberg paper also has some other interesting ideas on choice structures.

The other major control construct is the loop. A large majority of its uses are accounted for by the simple case in which termination occurs on the basis of a Boolean condition. One problem is where the termination condition should be checked: at the beginning of each iteration, at the end, or even in the middle. All three cases are useful. By far the best solution is that proposed by Dahl (in Knuth, 1974) in which the check is potentially in the middle:

tmp4C8-16_thumb[2][2][2]

Either group of statements may be null. This construct covers all three forms. It also, by and large, eliminates the necessity for having special constructs which terminate loop execution or restart the iteration (often written "EXIT" and "CYCLE," respectively).

One minor question in Boolean-condition loops is whether to terminate repetition on the Boolean condition becoming true ("UNTIL") or becoming false ("WHILE"). The traditional choice is WHILE. The WHILE school claims that the UNTIL focuses attention on the termination of the loop whereas the WHILE focuses attention where it belongs, namely, on the nature of the repetition. The UNTIL school claims that the WHILE focuses attention on the mechanics of the repetition while the UNTIL focuses attention where it belongs, namely, on the (Boolean) condition that the loop is working toward.

The other notable form of loop is the iterative FOR loop, which alters the value of an index variable in some predefined way on each iteration. Although the FOR loop could be constructed by the user from an ordinary Boolean-condition loop, this construction process is tricky and error-prone, and hence it is a good idea to have a built-in FOR loop.

In order to achieve reasonable readability and reliability, a FOR loop needs some rather extensive restrictions. Modification of the index variable should not be permitted inside the loop; the compiler should enforce this restriction. Any values used to set the parameters of the index-variable alteration should be evaluated on entry to the loop and should not be alterable thereafter. It is not clear just what the value of the index variable should be on exit from the loop; a simple solution is to say that the FOR itself constitutes the declaration of the index variable and that the index variable is accessible only within the loop. This also relieves the programmer of the nuisance of having to declare the index variable.

The remaining problem is the nature of the variation of the index variable. The traditional approach is to specify an initial value, final value, and increment. Many designers (Hoare, 1972) tend to consider this overgeneralized and detrimental to readability, especially when applied to floating-point numbers. The approach taken by PASCAL is to restrict the increment of the loop to be 1 or -1 by giving the bounds of the loop as "x TO y" or "x DOWNTO y," and to prohibit floating-point index variables. It has been observed by Alexander (1972) that this covers the vast majority of cases. Note that there is no particular reason why the index variable should not be, for example, of type "character" rather than "integer." Consider the example:

tmp4C8-17_thumb[2][2][2]

(Note the use of "FROM" where some forms use the assignment operator; "FROM" is probably clearer.) It is desirable to exclude floating-point numbers, however; the peculiarities of floating-point arithmetic complicate things too much and thereby interfere with readability.

In languages with the "set" as a basic data type, it may well be desirable to have a FOR which simply sequences through the members of a programmer-supplied set:

tmp4C8-18_thumb[2][2][2]

Here the index variable x simply assumes in turn the value of each member of the set. In what order should the index variable assume the various values? Perhaps the least offensive solution is to say that the order is implementation-defined and should not be depended upon in programs.

Although the PASCAL approach to the FOR loop can be recommended, designers wishing to experiment have several other options. The set-driven FOR loop previously discussed is useful in a language which handles sets well. Another proposal, which has been experimented with by Gannon (1975), is a FOR loop of the form:

tmp4C8-19_thumb[2][2][2]

where on each iteration, the index variable x refers to one element of the array (i.e., any reference to x refers to that element of the array). This FOR loop has the decided advantage that nested loops are not necessary for work with multidimensional arrays. Its limitation is that FOR loops are not always for array handling.

The procedure is another fundamental control construct. The procedure is one of the most important tools for breaking big problems up into little pieces, and this has implications for its design. In particular, the procedure-call interface must be efficient, and it must be possible to check types of parameters and results fully.

One of the least-standardized aspects of procedures is the matter of parameter passing, although it now appears that two types of parameters are used most often. The first is the pass-by-value parameter, where an expression is evaluated and the result is made available as the parameter. The other useful type of parameter is the pass-as-variable parameter (also called pass-by-reference), in which a variable is passed in to be modified. This may be implemented either by passing a pointer or by copying in the initial value and copying out the final value.

The utility of passing procedures themselves as parameters is debatable for some types of languages. If this feature is rarely used the language designer is probably justified in omitting it, since it can be simulated with not too much difficulty, is awkward to implement, and makes it hard to do complete run-time error checking. The same comments apply to ALGOL 60′s pass-as-if-by-macro-substitution parameters.

The other side of the parameter-passing issue is the-matter of returning a value from a procedure. Some languages make a distinction between functions, which return values, and procedures, which do not. Such a distinction is probably most appropriate when; as in an early version of PASCAL, functions are not allowed to have side effects (i.e.. to alter parameters or global variables). It certainly corresponds well to the programmer’s intuitive idea of "doing something" versus "yielding a value." It should be noted, however, that it is very hard to enforce a no-side-effects rule.

The actual method of returning a value from a value-returning procedure is a problem for which several solutions have been found:

1. An explicit RETURN construct is used with provisions for tacking on a value, e.g., "RETURN(x)."

2. A value is returned by assigning the value into the procedure’s own name as if it were a variable.

3. In an expression language (see Sec. 3-5.2), the procedure’s executable body as a whole has a value, which is the one returned.

The disadvantages of an expression language narrow this down to the first or the second alternative. The second alternative is less attractive because it involves the use of the procedure name in a rather strange way, but it has the advantage that it is not coupled to a return-of-control construct. A designer who does not wish to return from procedures except by "flowing off the end" will probably end up adopting alternative 2; most other designers will probably opt for the convenience and cleanliness of alternative 1.

As a final matter of syntax we should address the question of what a call to a procedure should look like. ALGOL and its descendants simply use the procedure name as a statement. This seems the simplest approach for situations involving a returned value. Unfortunately, it makes pure procedure calls look just like variable accesses, which is probably undesirable, since procedures do not return values. For this reason, some language designers have introduced a special keyword (e.g., CALL) to clearly delineate the two types of procedure invocations.

No consideration of procedures should ignore the question of whether they should be implemented as subroutines or macros. Although the subroutine implementation is usual, it has been suggested that the programmer should be able to decide which method to use, independently of the contents of the procedure itself. When designing a language for implementing system software, or other tasks where execution speed may become critical, such flexibility may be beneficial. It should be noted, however, that there are problems in ensuring that the procedure sees the same environment either way.

There have been some studies (Wulf, 1972b) of the more general question of specifying the interface to a procedure (i.e., calling conventions) in a language-independent manner. This would simplify linkage between different languages, allow optimized linkage sequences for critical situations, and simplify handling of such things as variable-length parameter lists. The issue is, however, rather complex because machine dependencies creep in; no excellent solution has yet appeared.

The most controversial control constructs currently are the "escape" constructs, which allow departures from the "single-entry, single-exit" concept of control structures. The RETURN mentioned earlier in connection with procedures is an escape construct, and some languages do not have it. Generally, the RETURN is sufficiently useful and its operation is sufficiently clear that few people object to it.

Another highly restricted but very valuable escape construct is the STOP statement, which is used to terminate execution of the program. STOP is invaluable for coping with error situations; it should be available in any language which runs under an operating system (i.e., which can associate a meaningful action with STOP).

Of the more controversial constructs, probably the best is Zahn’s situation-driven escape, publicized by Knuth (1974). An example is roughly as follows (with some alterations to the original syntax):

tmp4C8-20_thumb[2][2][2]

where the body of the construct ("search for something") may contain any number of statements of the form

SIGNAL found

or

SIGNAL not.found

The execution of a SIGNAL statement causes immediate execution of the corresponding "statements" after the CAUSE, and then termination of the whole construct.

The use of situation names "found" and "not_found" in this example illustrates the major point of the construct. The search loop may terminate in one of two ways, each requiring a different action, and neither being truly "abnormal" compared with the other. The principal value of this particular construct lies in separating the cause of the exit (termination of the search) from the actions taken as a result, thereby rendering both the search code and the action code more readable.

Some work has been done, notably by Brinch Hansen (1973, 1974, 1975). on control structures for managing parallel processes. This area is maturing, and it is becoming increasingly important to have such features in a language, at least for some areas of work such as communications software.

An important area of control structures which has not been thoroughly investigated is the whole question of exception handling. How does one build control structures to cope with reporting and dealing with error conditions, whether detected by the hardware, the run-time checks, or the user’s own program? A number of open questions exist, and undoubtedly more research will be undertaken in this area.

A minor aspect of control structures is the question:

Where does the program start execution?

A good, readable approach is that taken in PL/I in which execution begins with a call to a procedure named "main." This also provides an elegant means of passing parameters to a program as parameters to "main." C (Ritchie, 1974) provides a similar type of construct.

One useful structure which has not become widely accepted is the coroutine. Coroutines can be considered as either a generalization of procedures or a specialized, highly efficient simulation of parallel processes. The idea is that an active coroutine may pass control to another active coroutine. When control returns to a previously invoked coroutine, this coroutine carries on where it left off, resuming execution immediately after the exchange-control construct that resulted from the last transfer of control.

Coroutines are particularly useful for situations where one part of a program "feeds" another part a stream of data; many programs can be characterized in this way. An easy way to implement such a program is by using parallel processes; unfortunately, this is prohibitively inefficient in most systems. Coroutines can be made almost as efficient as procedure calls. It is not clear whether the number of active coroutines should be constant at run time (i.e., a coroutine is a special sort of procedure) or variable at run time (i.e., a coroutine is a special sort of procedure invocation). It appears that the first alternative handles most requirements.

Two very useful constructs, which may not strictly belong under "control structures" but which do not clearly belong anywhere else, are assertions and invariance relations. These constructs basically offer the programmer an opportunity to state (Boolean) conditions which should be true at a point in the program (assertions) or everywhere in the program (invariance relations). These notions have not been applied to languages used for production programming mainly because of the difficulty in proving large programs correct. Again, this may change as research proceeds in this area, and unquestionably the development of EUCLID (Lampson et al., 1977) is a step in this direction. For further information in this area it is worthwhile to read the report on EUCLID.

Even if assertions and invariance relations are treated simply as another form of comment, they are valuable. It is also possible (Clark, 1971) to have the compiler generate code to check at run time that the conditions are true. This feature can assist verification considerably. Finally, there is the possibility of being able to apply automated theorem-proving techniques to such assertions; this would greatly simplify writing correct programs. One problem with this approach is that current programming languages are seldom at a high enough level to express complex assertions easily.

A number of control constructs have been proposed for nondeterministic operations and " backtracking" (unraveling the effects of previous code to go back and try again), as discussed in Prenner et al. (1972) and Bobrow and Raphael (1974). These constructs are particularly applicable in artificial intelligence.

We have intentionally ignored the GOTO construct because current experience indicates that with a decent set of control constructs, the GOTO is simply unnecessary (Wulf, 1972a). A common thought among proponents of the GOTO is: "I just might need it; something might come up." The answer to this appears to be: "Nothing ever does." It cannot be too strongly emphasized that this statement applies only if a good set of non-GOTO constructs is provided. Many languages are somewhat deficient in this matter.

The following control constructs are clear failures or primitive attempts at doing what has been done better and more simply:

1. LABEL variables.

2. Dynamic instruction modification (e.g., COBOL’s ALTER).

3. The overly restrictive FORTRAN DO-loop.

The designer is well advised to ignore these.

Compile Structure

" Compile structure" is the term used here to cover certain aspects of a language which tie in very strongly with the compilation process. The major topics of interest are directives to the compiler, operations within the language which are done at compile time rather than at run time, and the question of separate compilation of different "modules" of a program.

It is often desirable to give the compiler certain items of information which cannot easily be expressed in the language itself. The most prevalent approach to this is to have a special form of comment (the ALGOL 68 term " pragmat" seems to be the best word for it) which has no significance to the language itself but which contains directives to the compiler rather than normal comment text. One method of distinguishing pragmats from normal comments is to have them begin with the normal comment delimiter, which is then followed by a sequence of characters. This sequence should be exceedingly unlikely in comment text. For example:

// This is a comment. //:: This is a pragmat.

The actual contents of a pragmat are, of course, compiler-dependent; a pragmat might contain settings of options, optimization advice, specifications of listing format, and so on. Another, perhaps better, way of handling pragmats is to define them in the phrase structure of the language so that the parser can parse them and build trees for them if necessary.

Languages vary greatly in the amount of manipulation possible at compile time. It is notable that most facilities for compile-time operations are rather seldom used; one or two specific constructs seem to fill most needs.

The most notable such facility is the ability to request that the compiler insert the contents of another source file at a specific point in its input stream (the PL/I "^INCLUDE" facility). A language should have this feature or some equivalent facility. A pragmat could be used to specify the inclusion.

The other major compile-time facility is conditional compilation—the ability to ignore selective parts of the input text on the basis of compile-time conditions. While there are several ways of doing this, the method used by ALGOL 68 (Van Wijngaarden et al., 1975) is very noteworthy, since it does not require the introduction of yet another construct into the language. The ALGOL 68 approach is simply to check the selection expressions of IFs and CASEs to see if they can be evaluated at compile time. If they can, only one of the alternative statement sequences need be compiled (note that it is still desirable to error-check the others). This approach is simple to implement and works well.

The issue of separate compilation has generated considerable debate. Its desirability is not in question, as anyone who has ever implemented a large software system will testify to its value. The fact that just about every systems-implementation language includes some sort of separate-compilation facility is additional evidence.

The main difficulty is the poor support for separate compilation in current systems. It is vital to a good separate-compilation system that there be some way to verify the correctness (e.g., matching data types) of calls to separately compiled modules. Unfortunately, most current linkage editors (the usual tools for putting modules together) do not include facilities for such checking. The result of this is that compiler writers usually must implement their own linkage system, which is not necessarily a separate program.

A major language-design question is:

What should the form of the separately compliable "module" be?

The traditional answer is "a procedure"; unfortunately, this does not fulfill the frequent requirements for having "static" or "own" data items associated with a procedure or with a group of several procedures.

The best approach is probably one in which a module contains both data items and procedures. It may be desirable to restrict the scope of the data items to the module itself, so that they may be accessed from outside only via the procedures. Such an arrangement provides a crude form of the data-abstraction facilities mentioned earlier. If the language contains one of the more complex data-abstraction techniques, that technique may be used as the basis for the definition of "module." Ada supports many of these ideas and a discussion of these features is given in Sec. 3-8.

It is desirable to specify the possible interactions between modules instead of adopting an "anything goes" policy; some work has been done on this (DeRemer, 1976), especially as these concepts relate to the implementation of data-abstraction facilities.

I/O Structure

A final important component in any programming language is the set of facilities for handling input and output. The language designer cannot choose to ignore

this aspect—the bad experience resulting from the design of ALGOL 60 dictates that this is so. Undoubtedly, ALGOL 60 would be a much more popular language today had the ALGOL committee defined an input/output structure.

Input and output facilities can be provided at three levels of sophistication in the language. A first level is commonly called the "format-free" form of input/output. PL/I’s data-directed and list-directed I/O and COBOL’s DISPLAY and EXHIBIT statements are examples of this type of I/O. The main function of these statements is to provide a simple form of program/programmer communication to verify program correctness. The programmer should be allowed to display easily values of important variables and to check quickly the logic of a program on small, selected sets of input data.

At the second level is the formatted form of input/output. In this form the value of each variable in an input or output list is read or written according to predefined format controls. These controls usually indicate a field length and data type which are applied to the value of a particular variable in the format list. FORTRAN’S formatted I/O and PL/I’s edit-directed I/O provide examples of this type of structure. In COBOL, the format specification is tied to a record description which is physically separate from the I/O statement. Unfortunately, if a programmer wishes to handle a record for input/output differently from how it is stored, two record descriptions have to be created—one reflecting how the data are to be displayed or are organized for input and one dictating how the information is stored.

The C language presents an unusual and innovative variation for formatted I/O. The format control characters are included as part of a textual string used for annotation. For example:

printf("the value of x is %d:\n the value of st is %s",x,st) will print

the value of x is 4:

the value of st is string

if x = 4 and st = "string." The format control characters d and s correspond to integer (base 10) and string formats, respectively, and the " \ n " indicates a new line.

A main consideration with formatted I/O is whether or not to allow format specifications to vary dynamically at execution as is accomplished, for example, in ALGOL 68′s format statement. The following statement illustrates this concept:

outf(stand out;

$c("SUN’\ "MON", "TUES", " WEDNFiS ", "THURS". "FR1 ", "SATUR") "DAY" $,i);

The name of the appropriate day will be output based on the value of i.

Considerable overhead is incurred by allowing dynamic format specifications; however, for applications involving a significant amount of sophisticated I/O editing, the overhead may be warranted.

The final type of I/O really involves facilities for the storage and retrieval of information in files. It is generally agreed that there are three types of file organizations: sequential, indexed (or indexed sequential), and direct. Other organizations based on secondary keys are possible but are generally not provided in a high-level procedure-oriented language. Since the access facilities for each of the file organizations are usually provided through system routines, the types of files expressible in a language are somewhat machine-dependent. In addition, the types of facilities provided depend on the amount of data-processing activity expected to be completed using the language.

REDUCTION OF SIZE

Although "keeping it small" is important throughout the design of a language, it is useful to pay special attention to size reduction once the design has solidified.

The basic reasons for keeping a language small can be summed up simply: the smaller the language, the easier it is to read, the easier it is to write, the easier it is to produce a reliable, efficient compiler, and the easier it is to write clear, readable documentation describing it. The basic reason for increasing the size of the language is the so-called Turing tarpit: if the language is too simple, one may be able to do anything in it but it may be impossible in practice to do anything in particular.

The major mechanism for eliminating constructs from a language is reviewing them for usefulness. Will the construct be used? Does it duplicate the facilities of another construct? Is there a simpler subset of the construct which would account for almost all uses of it (i.e., is the construct overgeneralized)?

The removal of a construct must be considered carefully as its removal will require the user to simulate its effect. If the construct will never be used, obviously it should be removed. If the construct will be infrequently used but is difficult or impossible to simulate with the other constructs, it should probably be left in (the CASE statement is a good example). Assuming that the construct will be used with moderate frequency and that it is not overly difficult to simulate, the final consideration is that simulating it will inconvenience the user, increase the size and complexity of the program, and decrease the program’s reliability because of the possibility of an- erroneous simulation. The designer must weigh these considerations carefully.

Next post:

Previous post: