Programming-Language Design (Compiler Writing) Part 5


This section is concerned with certain aspects of the practice of programming-language design. In particular, we discuss a number of tools which can be used to specify a language, the problem of evaluating a language design, and language-implementation considerations.

In any discussion of language-design tools, a few words should be said about terminology. Good, well-thought-out terminology can help make language documentation easier to read and understand, even during the design process itself. If possible, new terms should not be coined; most of them are already available for your use. The ALGOL 68 documentation (Van Wijngaarden et al., 1975; Lindsey et al., 1972) is an excellent example of terminology which is different (seemingly) but not significantly different from a functional point of view. It is undesirable to force your users (or yourself) to learn another new language just to discuss your programming language.

The method you use to describe your language is a significant design tool. Currently, language description is split into three parts; context-independent syntax (what is normally thought of as syntax), context-dependent syntax (type checking, consistency, etc.), and semantics (what the constructs do).

Descriptive techniques for context-independent syntax have reached an adequate level of formalization while retaining simplicity. BNF has been around for quite a while, and with all of its simplicity, it is almost adequate. With the additions to BNF discussed in Sec. 2-4.4, it will serve well enough for just about any design process. For final documentation, the syntax charts (see Jensen and Wirth, 1975, or Barnard, 1975) are perhaps clearer (although Barnard’s form is not as general as BNF). For the design process, however, syntax charts are rather unwieldly; extended BNF is considerably easier to write. There is no evidence that more complex methods than these are either necessary or desirable for describing reasonable languages.

Context-dependent syntax (this is a poor term, but there does not seem to be a better one) is a somewhat stickier problem. The attempts that have been made to produce a formal notation for it, notably the two-level grammars used to describe all the syntax of ALGOL 68, have resulted in notations that are cumbersome to use and obscure to read. Currently, narration is probably the best descriptive technique for context-dependent syntax.

Several methods are available for describing the semantics of a language. Some of the best methods for practical use are those developed by Hoare (1969) for control structures and Hoare, Dahl, and Dijkstra (1972) for data structures. Hoare’s control-structures method has gained general acceptability; however, his data-structures method is less completely accepted, since data structures are still the subject of much dispute. Hoare and Wirth (1972) have used these methods to define most of the semantics of PASCAL in a document which is remarkably readable.

It is possible to describe a language either top-down (start with "program") or bottom-up (start with "number," "identifier," etc.). For reasonably sized languages, there is little to choose between them. The top-down approach is perhaps slightly clearer for situations where there is little nesting of definitions (e.g., the description of "statement"). The bottom-up approach seems definitely superior for cases involving much nesting (e.g., the description of "expression").

For descriptive purposes, it is often much clearer to use a slightly ambiguous definition rather than go through the complexities necessary to remove all ambiguity. Obviously, this point is applicable primarily to the user’s manual, not to the formal definition.

It is worth pointing out that formal descriptions are not just methods of ensuring that all implementations do things the same way. They serve also as a method of uncovering weak spots in the design, by requiring the designer to put everything down on paper. An attempt should be made to formalize context-independent syntax (at least) as early as possible. It is a good way of discovering that you have not really thought about some parts of the language.

Formal descriptions should be as readable as possible. An all too common problem is the use of needlessly unreadable notation in descriptions (e.g.. using single characters, often from strange alphabets, where mnemonic names should be used).

A language is of little worth unless it is properly documented. An informal presentation of the language for the user should be written before the design is finalized. Not only will this ensure that this vital task gets done, but trying to explain things simply is a fine way of finding problems and omissions.

Remember that at least four different kinds of documents are involved—the design working documents, cleaned-up formal presentations for reference (especially for implementers), informal presentations for regular users and experienced newcomers, and tutorial presentations for inexperienced newcomers. Tutorial presentations may be skipped if the new users will be experienced programmers, but both formal and informal presentations are probably essential.

Although Hehner (1975), for one, has experimented with top-down design of languages, and this idea certainly has some merits, the method invariably used today for language design is bottom-up. The approach usually used is to combine a number of features into a language, modifying the features as necessary to make the language fit together smoothly. Attention needs to be given to both the selection of features and their combination.

If it is vital that the language be implemented and running quickly (say, for a project which is waiting for it), then stick to old, proven features. Remember that if a feature is new and controversial, a better form of it may evolve overnight. Consequently, you will be left with the choice of either making your language conspicuously obsolete or falling prey to the "moving-target" problem. It is very difficult to implement a language which keeps changing.

It is important to watch out for unfavorable interaction if several features are combined into one. Such interactions can cause even greater problems than poor features. Often two features drawn from different languages will overlap somewhat when they are put together. For example, if you borrow PASCAL’S data types but also include variable-length strings, there is little point in supplying PASCAL’S single-character variables. In fact, there is potential built-in confusion: is ‘ x1 a character or a character string?

We now turn to the important problem of evaluating a language design. After the first draft of the design is finished, it should be subjected to a variety of evaluation processes to determine what changes should be made (don’t assume that there won’t be any!). Except in an emergency, this phase of the design should be carried out in a very leisurely fashion to allow everybody to have second (third, fourth, etc.) thoughts about everything.

To avoid the problem of tunnel vision (you have worked on it so long that you have convinced yourself that it is good), see what other people think of it. Try it out on potential users, remembering that their complaints and suggestions should be examined carefully—what they want is not always what they say they want. If you have people on hand with experience in language design, try it out on them. Some of their criticisms may derive from differences in opinion rather than from real flaws, but they will usually make this clear. Remember that you do not have to accept other people’s suggestions—but do not ignore them.

Try some nontrivial sample programs in the language. Set yourself a serious programming goal and then try to achieve it within your language. Do not deliberately avoid the unpleasant areas; in particular, try out your I/O on a real situation. It is probably a good idea to deliberately tackle at least one problem which will result in several hundred lines of program. The time invested will most likely be justified by the results.

When you are doing sample programs, there are a few distinctive signposts of trouble that should be watched for. GOTOs and "temporary" variables are often symptomatic of defects in either programming technique or language design. If you find yourself needing either one, ask yourself whether this is likely to be a frequent situation. If so, do something about it. Another, more generalized symptom of trouble is having to repeat yourself frequently. Such repetition is an error-prone nuisance; users do not like it, and they are justified in this.

Consider what sort of code the compiler will generate. Perhaps you might try hand compiling one or two of your sample programs. This is a good way of uncovering mismatches between the language and the machine.

Finally, let it stew a while. If you find yourself thinking "that really should be done better—but that way will do," drag your dissatisfaction out and look at it. It will be a lot easier to fix now than when the implementation is half complete. Make a list of things you think are flaws—no matter how trivial they seem. Think about them. If you think about them long enough, you may find yourself fixing them. Be careful to do this in advance of the actual implementation effort, as leaving it to a later date can lead to a severe case of the moving-target problem.

After the language design has been properly evaluated and the language is settled, the next step is to implement it. Once the implementation starts, your worries are almost over. Ideally, you will be implementing it yourself. If not, keep in close touch with the implementers; make sure they commit no heresies, but think seriously about any criticisms they offer. A more serious problem will be the inevitable requests from somebody (or everybody) else for changes.

The moving-target problem is the single worst obstacle a language implementation can face. There are two notable techniques for avoiding it. First, a useful way to deal with change requests which are clearly ridiculous but which cannot be ignored for "political" reasons is to reply "certainly, in the next version." The idea here is to get a working version up and then think about improvements. I his method should be used only to forestall changes that are clearly ridiculous.

The second method is more generally applicable, and was used by the SUE group at the University of Toronto (Clark, 1971). The idea here is to schedule "change points" at fairly wide intervals, say 3 months. Between change points, the language is frozen. Suggested changes are accumulated and thought about until the next change point and are then acted on if they seem justifiable. This method results in comparatively few changes, at regulated times, and tends to defuse the moving-target problem.

It is highly desirable that the whole language design be the responsibility of one person. Consultation with others is desirable, but one person should have complete and final say. Committee languages generally are "kitchen sink" languages; such a language seldom exhibits either unity or cleanliness (consider PL/I as an example). It is also desirable to have a single implementer if at all possible.

The basic scenario of language design is described in the following informal algorithm.

1. Ask questions: What is wanted? How will it be used?

2. Consider possible features.

3. Consider the overall design and the integration of the features into it. Watch feature interactions.

4. Consider details (syntactic carbohydrates, etc.). Consider parsing and error checking.

5. Write formal definitions. Write a user’s manual. Write anything else that strikes you as useful.

6. Evaluate the design as discussed earlier. Make changes and begin step 3 again. Do not proceed to step 7 until you are completely satisfied.

7. Reduce the size. Iterate earlier steps if necessary. Do not proceed to step 8 until you have exhausted all possibilities.

8. Implement it.

9. Wait for the user reaction.

10. Admit that you may have made mistakes.

11. Start on a new version of the language considering possible extensions.

In closing, it should be remembered that research in the field of language design is ongoing, and it is highly likely that some of the more serious problems will be solved in the near future. We next consider certain design aspects of Ada.


Ada is a general-purpose language for numeric computation, general systems programming, general applications programming, real-time industrial applications, and embedded computer applications. (Embedded systems are systems in which a computer is but a part of a large whole, for example, a computer is embedded in a ship, a cruise missile, or an aircraft.) Ada was named in honor of Ada Augusta, Countess of Lovelace and daughter of the poet Lord Byron. Ada Lovelace was a colleague of Charles Babbage. Babbage had created a difference engine that could be programmed in a manner similar to a Jacquard loom. Since Ada often programmed the difference engine, many consider her to be the first programmer.

The U.S. Department of Defense (DOD) sponsored the development of the language. The purpose of this sponsorship was to develop a programming language suited for embedded computer applications (i.e., military-related applications). The development effort began in 1975 with the formation of the DOD Higher-Order Language Working Group whose mandate was to establish a single high-level programming language for developing embedded computer applications.

Initially, a set of requirements for a common programming language was produced. These requirements were produced in a sequence of reports called Strawman, Woodman, Tinman, Ironman, and Steelman. Input to these documents came from industry, universities, and the DOD. The notion of designing a programming language based on a set of requirements (Steelman) was a new approach. These requirements specified characteristics in areas such as data types, modules, tasks, and control structures. Also included were requirements such as readability and simplicity.

By 1977 it became clear that no existing programming language met the Tinman requirements and that a new language should be designed and developed. It was recommended that PASCAL, PL/I, or Algol 68 be used as a starting point in the design of the new language.

Four of the sixteen language design proposals received were deemed to be acceptable and were funded for a six-month preliminary design phase. The four winning designs, all based on PASCAL, were submitted for evaluation. Because of their PASCAL base, the preliminary designs were produced in six months.

The four preliminary designs were evaluated by many (around 80) evaluators from industry, government, and universities. These evaluations resulted in two of the four languages being selected for one year of further development. The two completed designs were subsequently analyzed by some 50 analysis teams. Their findings were evaluated and the winning language proposed was named Ada. A preliminary reference manual for Ada (SIGPLAN Notices, 1979) and a rationale for its design (Ichbiah et al., 1979) were produced. A more recent version of the reference manual was published in 1983 (ANSI, 1983).

Although many computer scientists and users participated in the design of Ada, it was not designed by a committee.Ichbiah assumed complete responsibility for all the design decisions. Ada appears to be a coherent, integrated whole rather than a collection of interesting but disjointed features.

This section first examines Ada with respect to the design criteria presented earlier. During the development of the Ada language, it became obvious that the language was only one component of a set of tools which would be required for the development of software throughout its life cycle.

The second subsection briefly examines the features of an Ada programming support environment (APSE). Ada is also a part of a larger Department of Defense effort to improve software technology. This effort is called the Software Technology for Adaptable Reliable Systems (STARS) program. We also outline the basic components of the STARS program.

Evaluation of Ada from a Language Design Viewpoint

Here we consider Ada from the programming language design criteria presented in Sec. 3-4. In certain instances Ada is discussed with respect to more than one criterion. For example, the criteria of usability and programming effectiveness are combined. In particular we discuss Ada with respect to the following issues in turn: usability and program effectiveness (including abstraction); machine independence and portability; efficiency; modularity and maintainability: compilabil-ity and compile structure; and simplicity. We also examine some drawbacks of the Ada language. The discussion contains some Ada program segments.

Recall that Ada is similar to PASCAL. Ada contains the following levels of program structure: characters, lexical units (or tokens), expressions, assignment statements, control structures, declarations, program units, and compilation units. A program unit associates declarations, which define the attributes of identifiers with statements, which use these declarations. A compilation unit, on the other hand, is a unit of structure for developing a program and separate compilation.

A compilation unit is either a subprogram or a module. As in PASCAL, an Ada subprogram is either a function or a procedure. A module, however, is a language construct that reflects a recent development in programming language design. A module can be either a package or a task. Figure 3-1 exhibits the composition of a compilation unit.

A package is used to specify a collection of logically related computational resources. For example, a stack package can provide a collection of resources for representing and manipulating stacks. Operations for pushing elements onto and popping elements from a stack would be included in such a package. A task is a modular program unit that is used in the specification of concurrency. Here we deal with both kinds of modules. Much greater emphasis, however, will be given to packages.

Usability and program effectiveness. The value of abstraction has been recognized by programmers in recent years. Most programming languages support abstraction. In Ada, however, the concept of abstraction has been elevated to a prominent position. Ada also directly supports the development of software by both bottom-up and top-down methodologies. First we discuss the support of abstraction in Ada by using packages, generics, and tasks. Second, a brief description of Ada’s support of structured software development is given.

Classification of compilation units in Ada

Figure 3-1 Classification of compilation units in Ada

Packages The concept of a package in Ada is an important contribution to the theory of programming. A package, which is used to define logically related resources, consists of two components: a package specification and a package body. Each component is a unit that can be compiled separately. The package specification component describes the resources available in the package to a programmer. This component is usually written and compiled before use is made of the package. The package bod^ specifies how the resources of the package are implemented.

The structure of a package is given in the following example:

package package_name is : visible part private : private part

package body package_name : package body end package_name

The package specification part consists of a sequence of declarations for data types, data objects, subprograms, and so on. The declarations preceding the keyword "private" make up the visible part of the package specification. The remainder of the package specification, which is optional, is the private part of the specification. An example of a package that contains a private specification part is given later. The declared names identify the resources in the form of functions and procedures. Recall that names can also identify data types, data objects, and so on.

The package body describes how the resources or facilities mentioned in the package specification are realized. The body may include declarations of local variables, implementation of subprograms, initialization specification, and so on.

An example of a package that facilitates the manipulation of complex numbers will now be given. The operations on complex numbers are restricted to adding, subtracting, and multiplying complex numbers. The package specification in the example has a data type declaration (COMPLEX) that contains two elements—the first (R) represents the real part of the complex number and the second (I) is its imaginary part. Each complex operation has as input two complex numbers and produces as a result a complex number.

A package specification for the complex number application may take the following form:


In this example three functions, denoted by " + ", " – ", and " * " are defined. In these instances, only the name, the type of returned value (e.g., COMPLEX), and the names and types of the formal parameters are given.

Ada permits names of subprograms to be overloaded. That is, more than one subprogram can have the same name provided the different cases are distinguishable by examining the types, number, and names of parameters and the type of the result (for a function). Conceptually, overloading supports abstraction since the same subprogram name can be used for equivalent operations on different data types. In the current example, the operators’" + ", " — ", and " * " are all overloaded. Note that in these instances the predefined operations are overloaded.

The package specification for complex number manipulation can be separately compiled. All names in the specification part, such as COMPLEX and the record components R and I, are all accessible or visible to the user.

The following example illustrates this flexibility.


A package body for the previous body specification for complex number manipulation could be as follows:


Although the current complex number package treats complex numbers as abstract objects, it does permit the manipulations of their real and imaginary components. Such flexibility, however, should not always be made available to the user. The components of a complex number can be made unavailable to the user by declaring type COMPLEX to be a private data type. The following example exhibits such a private data type.


In this example the user can declare complex variables. These variables can be manipulated by the operators defined in the package. The user, however, cannot refer to the individual components of complex numbers. Consequently, we include the function CREATE_COMPLEX, which creates a complex number from its real and imaginary components. The private data type COMPLEX is said to be hidden from the user.

In summary, packages support abstraction by allowing irrelevant implementation details to be hidden and make visible to the user only high-level resources.

Generics Generic program structures and generic packages are mechanisms for supporting abstraction. Generic program structures support abstraction by using a common program structure and parameters to capture the similarities among several program subprograms. We first discuss generic program structures and then extend these notions to packages.

The notion of using parameters to support abstraction has existed for a long time. Subprograms have been the main vehicle for this type of support. In Ada, subprogram parameters must be variables. Ada, however, allows generic program structures. These structures have parameters whose values can be variables, types, and subprogram names.

Generic subprograms can be viewed as a macrofacihty in a macro-assembly language. As such, a generic subprogram cannot be called directly. Instead it must be referenced with generic parameter values and instantiated at compile time. This instantiated version can then be executed. The compile-time process of instantiating a generic subprogram yields instances of the parent generic subprogram.

The following subprogram is an example of a generic procedure that interchanges two data types.


Observe that the program contains a generic clause with a parameter type T, which is the generic parameter of the procedure interchange. Three instantiations of the procedure INTERCHANGE, denoted by INTERCHANGED INTERCHANGE.R, and INTERCHANGED, that interchange integers, reals, and vectors are:


The interchange procedures generated for each of these instantiations at compile time will have different codes, especially, for the procedure that interchanges vectors.

The previous example has shown that the generic variable can be a type. Generic parameters can also denote subprogram names. For example, a generic

The previous example has shown that the generic variable can be a type. Generic parameters can also denote subprogram names. For example, a generic subprogram could be written to obtain the definite integral of an arbitrary function In this case the generic variable would denote a function name.

Ada also permits the writing of generic packages. There are. however, differences between generic subprograms and generic packages, which makes a generic facility for packages more useful than that for subprograms. First, subprograms have parameters while packages do not. Therefore, the generic parameter is an improvement of an already existing feature in subprograms, whereas it provides a new facility for packages. Second, subprogram declarations are essentially instantiated by a subprogram invocation. Package specifications, on the other hand, cannot be multiply instantiated. Consequently, a generic facility for subprograms provides a compile-time version of an already existing facility available at run time through subprogram invocations while a generic facility for packages provides a new facility.

As an example of a generic package, consider the design of a flexible package for storing symbol table values. A generic package specification for such an application might take the following form:


Observe that the package specification contains two generic parameters—a type parameter and a variable parameter. By instantiating this generic package, several symbol tables of different value types and sizes can be created. For example, the following statements instantiate three separate symbol tables:


The first symbol table contains 50 elements of type integer. The second symbol table contains up to 75 real numbers. The third table can contain up to 100 strings.

Tasks Ada provides facilities in real-time applications for multitasking (i.e., concurrency). A task is a module that can be compiled independently. Like a package, a task is divided into a specification part and a body. The task specification part contains entry resource declarations that define the procedurelike calls, possibly to other tasks that can be used to communicate with the task. Entry calls, however, require synchronization between the calling and called tasks before they can be executed. The intertask communications are handled by a "rendezvous" in the form of an accept statement. During an accept statement only one path of execution is possible. After completing the executions of an accept statement, the rendezvous is said to be complete and the tasks then continue in an independent manner.

Support of software development Ada supports both bottom-up and top-down software development methodologies.

Bottom-up program development involves the implementation of higher-level program units in terms of already implemented lower-level program units. Most high-level programming languages support bottom-up program development because many of these languages have a subprogram facility that has a well-defined user interface that "hides" the details of implementation.

Top-down program development, on the other hand, begins from a high-level specification that is successively broken down into lower-level specifications until the specifications have been refined at the lowest level of the programming language being used. This approach is more in line with current software development methodologies since it corresponds more closely to the tasks that are performed by systems analysts. The top-down approach to program development, however, is more abstract than its bottom-up counterpart because program specifications are expressed in terms of lower-level specifications (e.g., modules) that are not yet implemented. Another advantage of top-down development is the early verification of the interfaces between program modules. In a bottom-up approach, the interfaces are the last (and more difficult and time consuming) things to be checked out.

Ada is well-suited to support both bottom-up and top-down methodologies because the module (package and task) facility permits the abstraction of the algorithm and data objects.

Ada directly supports top-down development through the aid of program units and subunits. A stub indicates the syntactic-interface information and the location where a separately compiled subunit will eventually be placed. For example, a programmer may write the following:


where the keyword "separate" indicates to the compiler that the subunit containing the procedure P with the parameter A (of type TYPE_A) will be compiled later.

At a later time, a programmer may compile the following subunit:


In this example, the keyword "separate" and its argument SAMPLE indicate to the compiler that the procedure P that follows is a subunit of SAMPLE and is to be compiled as though it were completely in the package SAMPLE outlined earlier. The compiler can recall the context surrounding the declaration of P in SAMPLE before proceeding to compile procedure P.

The previous examples illustrate that both Ada subprogram and package specifications can be treated as separate linguistic objects. Such specifications allow the programmer to specify the syntactic form of the interface with an indication that the semantic interface will follow separately. The compiler can also check the validity of the syntactic interface. This approach in Ada represents a novel programming language design direction which is a significant advance in the support of programming methodologies.

Machine independence and portability. Ada allows machine independent programs, even for floating-point numbers. Ada facilities program portability by segregating and controlling references to non-Ada environments.

Representation specifications allow a programmer to specify a straightforward mapping between a data object and its storage representation. Consequently, the structure of machine registers or hardware-defined data structures can be specified.

Second, special constructs called pragmas facilitate the specification of compiler directives that influence the compilation of a program without affecting its meaning. For example, the interface pragma indicates that a procedure has been written in some other language.

Third, there exist predefined set programs that enable an escape from the type system. For example, a programmer may view an object of one type as one of another type. A floating-point number can be viewed in terms of its hardware representation.

Portability is also one of the important issues in the Ada programming support environment. This environment is outlined in Sec. 3-8.2.

Efficiency. A potential drawback of using Ada is the size and complexity of the compilers required. These limits may restrict the types of computers on which Ada can be readily implemented. Another efficiency consideration involves the efficiency of object code produced. Since Ada was designed for the programming of embedded systems, which have severe time and space restrictions, this concern is understandable. These issues, however, are presently being examined by compiler writers.

Modularity and maintainability. Several kinds of inconsistencies that usually occur during the development of large systems will not occur in systems developed in Ada. A compilation data base allows modularization and separate compilation of components while still being able to perform type-checking on the entire program. The package and task facilities in Ada permit the writing of very modular programs. For example, the separation of the logical and physical interfaces by having a specification part and body in a package leads to the following maintainability advantages:

1. Any changes to a package body do not require any alterations to the source programs that reference the package. Furthermore, these programs need not be recompiled.

2. Any changes made to the private portion of the specification part of a package do not require changes in the source code of programs that reference the package. These changes, however, may require the recompilation of the source programs.

3. Any changes to the visible portion of the specification part of a package may require changes to the source program and, consequently, must be recompiled. For example, another procedure may be added to the visible part of a given package, or the number of parameters of a given procedure within a package may be altered.

Compilabiiity and compile structure. Units of Ada programs are separately compilable. In many instances, separate compilation of programs has led to problems. For example, it is possible to invoke a procedure with several parameters defined in one module with the wrong number of arguments in the invocation which occurs in a different module.

The Ada approach is to allow the separate compilation of several units. However, information about other units may have to be present. A data base, which is managed by the Ada compiler, is used to keep track of the various compilations and their logical interdependences between units in the separate source programs. Also, it is intended that the Ada compiler provide a query capability for checking the recompilation status of the database.

As mentioned earlier, certain kinds of program changes will require the recompilation of other dependent units. The Ada database, however, contains information that assists in reducing the amount of recompilation required. Furthermore, the approach is meant to preserve the same degree of consistency, regardless of the number of separately compiled units.

Ada strongly supports compile structure. As mentioned earlier, Ada contains programs that give directives to the compiler. Also, Ada has the ability to request that the compiler insert another source file at a specific point in the input stream. Finally, Ada, through its package facility, supports separate compilation.

Simplicity. Although elegant, Ada is not a small language. Its size and complexity are a well-publicized complaint of the language. A recent paper (Wickmann, 1984) addresses the size of Ada. It appears that not too much can be pruned from the language without reducing its capabilities. Although the language is huge, it appears to be piecewise simple. There are ways to manage this complexity. The first is to improve the training of programmers. Another way to manage complexity is to require the use of techniques to control complexity. For example, packages and generics can be used for this purpose.

In summary, it appears that Ada is overall a well-designed language that promises to become prominent during the next decade. One of the attractive aspects of Ada is the environment in which it is to be used. The next subsection examines this environment.

Next post:

Previous post: