High-Level Language Part 1 (PIC Microcontroller)

All the programs we have written in the last six topics have been in symbolic assembly language. Whilst assembly-level software is a quantum step up from pure machine-level code nevertheless there is still a one-to-one relationship between machine and assembly-level instructions. This means that the programmer is forced to think in terms of the MCU’s internal structure – that is of registers and memory – rather than in terms of the problem algorithm. Although most assemblers have a macro facility, whereby several machine-level instructions can be grouped to form pseudo high-level instructions, this is only tinkering with the difficulty. What is this difficulty with machine-oriented language? In order to improve the effectiveness, quality and reusability of a program, the coding language should be independent of the underlying processor’s architecture and should have a syntax more oriented to problem-solving.

We are not going to attempt to teach a high-level language in a single short topic. However, after completing this topic you will:

• Understand the need for a high-level language.

• Appreciate the advantages of using a high-level language.

• Understand the problems of using a high-level language for embedded microcontroller applications.

• Be able to write a short program in C.

The difficulty in coding large programs in a computer’s native language was clearly appreciated within a few years of the introduction of commercial systems. Apart from anything else, computers quickly became obsolete with monotonous regularity, and programs needed to be rewritten for each model introduction. Large applications programs, even at that time, required many thousands of lines of code. Programmers were as rare as hen’s teeth and worth their weight in gold. It was quickly deduced that for computers to be a commercial success, a means had to be found to preserve the investment in scarce programmers’ time. In developing a universal language, independent of the host hardware, the opportunity would be taken to allow the programmer to express the code in a more natural syntax related to problem-solving rather than in terms of memory, registers and flags.


Of course there are many different classes of problem tasks which have to be coded, so a large number of languages have been developed since.2 Amongst the first were Fortran (Formula Translation) and COBOL (Common Business Oriented Language) in the early 1950s. The former has a syntax that is oriented to scientific problems and the latter to business applications. Despite being around for over 40 years, the inertia of the many millions of lines of code written has made sure that many applications are still written in these antique languages. Other popular languages include Algol (Algorithmic Language), BASIC, Pascal, Modula, Ada, C, C++ and Java.

Although writing programs in a high-level language may be easier and more productive for the programmer, the process of translation from the high-level source code to the target machine code is much more complex than the assembly process described in topic 8. The translation package for this purpose is called a compiler and the process compilation.

The complexity and cost of a compiler was acceptable on the relatively powerful and extremely expensive mainframe computers of that time. However, until the mid-1980s the use of high-level languages as source code was virtually unknown for MPU-controlled circuitry. In the last decade the easy availability of relatively powerful and cheap personal computers and workstations, capable of running compilers, together with the growing power of MPU/MCU targets and financial importance of this market, is such that the majority of software written for such targets is now in a high-level language.

If you are going to code a task in a high-level language to run in a system with an embedded MCU; for example, a washing-machine controller, then the process is roughly as follows.

Conversion from high-level source code to machine code.

Fig. 9.1 Conversion from high-level source code to machine code.

The choice of a high-level language for embedded targets is crucial. Of major importance is the size of the machine code generated by a high-level language task implementation as compared with the equivalent assembly-level solution. Most embedded MCU circuitry is lean and mean, such as the remote controller for your television. Lean translates to physically small and mean maps to low processing power and memory capacity – and cost! Most low-cost MCUs have a low-capability processor with a few hundred bytes of RAM and a few kilobytes of ROM Program store at best. Thus to be of any use the high-level language and the compiler must generate code, that if not as efficient as assembly-level (low-level), at least is in the same ball park.2

By far the most common high-level language used to source code for embedded MPU/MCU circuitry is C. Historically C was developed as a language for writing operating systems. At its simplest level, an operating system (OS) is a program which makes the detailed hardware operation of the computer’s terminals, such as keyboard and disk organization, invisible to the operator. As such, the writer of an OS must be able to poke about the various registers and memory of the computer’s peripherals and easily integrate with assembly-level driver routines. As conventional high-level languages and their compilers were profligate with resources, depending on a rich and fast environment, assembly language was mandatory up to the early 1970s, giving intimate machine contact and tight fast code. However, the sheer size of such a project means that it is likely to be a team effort, with all the difficulties in integrating the code and foibles of several people. A great deal of self-discipline and skill is demanded of such personnel, as is attention to documentation. Even with all this, the final result cannot be easily transplanted to machines with other processors, needing a nearly complete rewrite.

Onion skin view of the steps leading to an executable program.

Fig. 9.2 Onion skin view of the steps leading to an executable program.

In the early 1970s, Ken Thompson – an employee at Bell Laboratories – developed the first version of the UNIX operating system. This was written in assembler language for a DEC PDP7 minicomputer. In an attempt to promote the use of this operating system (OS) within the company, some work was done in rewriting UNIX in a high-level language. The language CPL (Combined Programming Language) had been developed jointly by Cambridge and London universities in the mid-1960s, and has some useful attributes for this area of work. BCPL (Basic CPL) was a somewhat less complex but more efficient variant designed as a compiler-writing tool in the late 1960s. The language B (after the first letter in BCPL) was developed for the task of rewriting UNIX for the DEC PDP11 and was essentially BCPL with a different syntax.

Both BCPL and B only used one type of object, the natural size machine word – 16 bits for the PDP-11. This typeless structure led to difficulties in dealing with individual bytes and floating-point computation. C (the second letter of BCPL) was developed in 1972 to address this problem, by creating a range of objects of both integer and floating-point types. This enhanced its portability and flexibility. UNIX was reworked in C during the summer of 1973, comprising around 10,000 lines of high-level code and 1000 lines at assembly level. It occupied some 30% more storage than the original version.

Although C has been closely associated with UNIX, over the intervening years it has escaped to appear in compilers running under virtually every known OS, from mainframe CPUs down to single-chip MCUs. Furthermore, although originally a systems programming language, it is now used to write applications programs ranging from Computer Aided Design (CAD) packages down to the intelligence behind smart egg-timers!

For over 10 years the official definition was the first edition of The C Programming Language, written by the language’s originators Brian W. Kernighan and Dennis M. Ritchie. It is a tribute to the power and simplicity of the language that over the years it has survived virtually intact, resisting the tendency to split into dialects and new versions. In 1983 the American National Standards Institute (ANSI) established the X3J11 committee to provide a modern and comprehensive definition of C to reflect the enhanced role of this language. The resulting definition, known as Standard or ANSII C, was finally approved during 1990.

Apart from its use as the language of choice for embedded MPU/MCU circuits, C (together with its C++ and Java object-oriented offspring) is without doubt the most popular general-purpose programming language at the time of writing. It has been called by its detractors a high-level assembler. However, this closeness of C to assembly-level code, together with the ability to mix code based on both levels in the one program, is of particular benefit for embedded targets.

The main advantages of the use of high-level language as source code for embedded targets are:

Of course there are disadvantages as well, specifically when code is being produced to run in poorly resourced MPU/MCU-based circuitry.

• The code produced is less space-efficient and often runs more slowly than native assembly code.

• The compiler is much more expensive than an assembler. A professional product will often cost several thousand pounds/dollars.

• Debugging can be difficult, as the actual code executed by the target processor is the generated assembler code. The processor does not execute high-level code directly. Products that facilitate high-level debugging are, again, very expensive.

Program 9.1 is an example of a C function (a function is C’s counterpart to a subroutine) that evaluates the relationship:

tmp952_thumb

for example, if n = 5 then we have:

tmp953_thumb

In the implementation n is the integer passed to the function, which computes and returns the integer sum as defined. The program implements this task by continually adding n to the pre-cleared sum, as n is decremented to zero.

Let us dissect it line by line. Each line is labelled with its number. This is for clarity in our discussion and is not part of the program.

Line 1: This line names the function (subroutine) summation and declares that it returns an unsigned long integer (a 16-bit unsigned object in the compiler used to illustrate this topic) and expects an unsigned integer (a 8-bit unsigned object) to be passed to it called n.

Line 2: A left brace { means begin. All begins must be matched by an end, which is designated by a right brace }. It is good practice to indent each begin from the immediately preceding line(s). This makes it easier to ensure each begin is paired with an end. However, the compiler is oblivious of the style the programmer uses. In this case line 10 is the corresponding end brace. Between lines 2 and 10 is the body of the function summation().

Program 9.1 A simple function coded in C.

Program 9.1 A simple function coded in C.

Line 3: There is only one variable that is local to our function. Its name and type are defined here. Thus sum is of type unsigned long. In C all objects have to be defined before they are used. This tells the compiler what properties the named variable has; for example its size (16 bits), to allocate storage and its arithmetic properties (unsigned). At the same time sum is given an initial value of zero. The complete statement is terminated by a semicolon, as are all statements in C.

Line 4: In evaluating sum we need to repeat the same process as long as n is greater than zero. This is the purpose of the while construction introduced in this line. The general form of this loop construct is:

tmp955_thumb

The body of the loop, i.e. is the set of statements that appears between the following left and right braces of lines 5 and 8, is continually executed as long as the expression in the brackets evaluates as non-zero – anything non-zero is considered true by C. This test is done before each pass through the body. In our case the expression n>0 is evaluated. If true, then n is added to sum. n is then decremented and the loop test repeated. Eventually n>0 computes to false (zero) when n reaches zero and the statement following the closing brace is entered (line 9).

Line 5: The opening brace defining the while body. Notice that for style it is indented.

Line 6: The expression to the right of the assignment = is evaluated to sum + n and the resulting value given to the left variable sum. In adding an 8-bit to a 16-bit variable, C will automatically extend to 16-bits – see Table 9.1, lines 14 and 15.

Line 7: The value of n is decremented, as commanded by the — Decrement operator.3 This is equivalent to the statement n = n – 1; As an alternative, most C programmers would incorporate this into the while test expression thus: whi1e(–n > 0).

Line 8: The end brace for the while body. Again note how the opening (line 5) and closing braces line up. The compiler does not give a hoot about style; this is solely for human readability and to reduce the possibility of errors.

Line 9: The return instruction passes one parameter back to the caller, in this case the completed value of sum. The compiler will check that the size of this parameter matches the prefix of the function header in line 1, that is unsigned long. This returned parameter is the value of the function, i.e. the function can be used as a variable in the same way as any other. Thus, if we had a function called sqr_root() that returned the square root of a constant passed to it (see Program 9.2), then the statement in the calling program:

tmp956_thumb

would assign the returned value of sqr_root(y) to x. Line 10: The closing brace for function summation().

We see from Fig. 9.1 that the output from the compiler is assembly-level code, which can then be assembled and linked with other modules4 in the normal way. To illustrate this process, Table 9.1(a) shows the assembly-level code generated when the C code of Program 9.1 is passed through the Custom Computer Services (CCS), Inc cross-C compiler. This is a low cost C compiler (« $100) that can be integrated with MPLAB -see Fig. 9.3. The resulting listing file of Table 9.1(a) shows each line of C source code as a comment together with the resulting assembly-level code. Two minor changes were made to the source code to generate this illustrative listing:

• The function was renamed main() from summation() as each C program must at the very least have a main() function. This root function is similar to any other C function but causes the compiler to set up the software environment – see below.

• The initial #pragma directive tells the compiler to generate code suitable for the PIC16F84 device.

It is instructive to look at how the compiler has translated this program.

long main(int n)

Entry to the main() function is always at the Reset vector 000h. First the PCLATH SPR is zeroed and then execution jumps past the Interrupt vector to the start of the main block of code at 005 h. Here the Status and File Select registers are cleared.

Table 9.1 Resulting assembly-level CCS compiler output after linking.

CCS PCW C Compiler, Version 2.606, 5056

Resulting assembly-level CCS compiler output after linking

(a): Assembly-level code listing file generated by the CCS compiler.

Table 9.1: Resulting assembly-level CCS compiler output after linking.

Resulting assembly-level CCS compiler output after linking

(b): Executable Intel machine-code file.

This initialization phase is a feature of the main() function so that the ‘useful’ code can run from Reset in a known software state or environment. A C program typically comprises many functions but only main() will set up this environment.

long sum = 0;

The CCS compiler reserves two bytes for a long object. In this case File 12:13h stores sum low:high bytes. To zero these two GPRs the compiler has generated two clrf instructions:

tmp959_thumb

tmp960_thumb

This is implemented as an Add a single byte to a double byte operation thus:

tmp961_thumb

Many C programmers use the alternative statement sum+=n; which states sum augmented by n. – -n;

Now decrement the single byte in File 11 h.

tmp962_thumb

tmp963_thumb

while(n>0){

The compiler has allocated File 11 h for the single-byte int object n. n has been given a value by the calling function which has placed a datum in File 11 h which this function is going to operate on.

The while statement is implemented by testing n for zero and if true jumping to the the exit return statement.

n is added to 4 and then decremented.

Next post:

Previous post: