Assembly language Part 1 (PIC Microcontroller)

We have now been writing programs with gay abandon since topic 3. For clarity these listings have been written in a human-readable form. Thus instructions have been represented as a short mnemonic, such as return instead of 00000000001000b; the file registers similarly have names, such as INTCON; lines have been labelled and comments attached. Such symbolic representations are only for human consumption.

With the help of the device’s instruction set, see topic A, it is possible to translate from the human-readable symbolic form to machine-readable binary. This is not particularly difficult for a device such as a PIC that has a reduced set of instructions (RISC) and few address modes. However, it is slow and tedious, especially where programs of a significant length are being coded. Furthermore, it is error prone and difficult to maintain whenever there are changes to be made.

Computers are good at doing boring things quickly and accurately; and translating from symbolic to machine code definitely falls into this category. Here we will briefly look at the various software packages that aid in this translation process.

After reading this topic you will:

• Know what assembly-level language is and how it relates to machine code.

• Appreciate the advantages of a symbolic representation over machine-readable code.

• Understand the function of the assembler.

• Understand the difference between absolute and relocatable assembly.


• Understand the role of a linker.

• Appreciate the process involved in translating and locating an assembly-level language program to absolute machine code.

• Understand the structure of a machine-code file and the role of the loader program.

• Understand the role of a simulator.

• Appreciate the use of the integrated development environment to automate the interaction of the various software tools needed to convert source code into a programmed MCU device.

The essence of the conversion process is shown in Fig. 8.1. Here the program is prepared by the tame human in symbolic form, digested by the computer and output in machine-readable form. Of course this simple statement belies a rather more complex process, and we want to examine this in just enough detail to help you in writing your programs.

Conversion from assembly-level source code to machine code.

Fig. 8.1 Conversion from assembly-level source code to machine code.

In general the various translator and utility computer packages are written and sold by many software companies, and thus the actual details and procedures differ somewhat between the various commercial products. In the specific case of PIC MCU devices, Microchip Technology Inc. as a matter of policy has always provided their assembly-level software tools free of charge, a large factor in their popularity. For this reason commercial PIC software is relatively rare and what there is usually conforms to the Microchip syntax. For this reason we will illustrate this topic with the Microchip suite of computer-aided coding tools.

Using the computer to aid in translating code from more user-friendly forms (known as source code) to machine-friendly binary code (known as object code or machine code and loading this into memory began in the late 1940s for mainframe computers. At the very least it permitted the use of higher-order number bases, such as hexadecimal.1 In this base the code fragment of Fig. 8.1 becomes:

tmp910_thumb

A hexadecimal loader will translate this into binary and put the code in designated memory locations. This loader might be part of the software in your PIC-EPROM programmer. Hexadecimal coding has little to commend it, except that the number of keystrokes is reduced – but there are more keys – and it is slightly easier to spot certain types of errors.

As a minimum, a symbolic translator, or assembler,2 is required for serious programming. This allows the programmer to use mnemonics for the instructions and internal registers, with names for constants, variables and addresses. The symbolic language used in the source code is known as assembly language. Unlike high-level languages, such as C or PASCAL, assembly language has a one-to-one relationship with the generated machine code, i.e. one line of source code produces one instruction. As an example, Program 8.1 shows a slightly modified version of Program 6.11.This subroutine computes the square root of a 16-bit variable called NUM which has been allocated two bytes in the Data store.

Giving names to addresses and constants is especially valuable for longer programs, which may easily exceed 1OOO lines. Together with the use of comments, this makes code easier to debug, develop and maintain. Thus, if we wished to alter the file registers holding the variable NUM from File 20:21 h to, say, File 36:37h, then we need only alter the line:

cblock 20h

to:

cblock 36h

and then retranslate to machine code. In a program with, say, 5O references to the variable NUM, the alternative of altering all these addresses manually from 20h or 21h (high:low byte) to 36h or 37h respectively is laborious and error prone. In the body of our source code the high byte is referenced as NUM (that is the contents of File 20h) and the lower byte in File 21 h as NUM+1, as assemblers can do simple arithmetic on symbolic constants.

The pseudo instruction cblock is an example of an assembler directive. A directive is a command from the programmer to the assembler concerning its operation or giving a constant a name. We list a small subset of the Microchip assembler directives at the end of the topic, the reader should reference the official manual for a detailed description. Briefly the directives used in Program 8.1 are: cblock – endc

Rather like a block of equ directives, giving the encapsulated list of label constants starting either at the specified value, eg. 20h, or following on from the last cblock if no address is given. Labelled entities can be deemed to occupy more than one byte by using a colon-delimited size field; for instance NUM:2 for a 2-byte allocation and are normally used to name General-Purpose registers (GPRs).

Program 8.1 Absolute assembly-level code for our square-root module.

Program 8.1 Absolute assembly-level code for our square-root module.

end

Tells the assembler that this is the end of the source code. equ

Associates a value to a symbol. For instance the assembler replaces the name STATUS by the value 3 anywhere it appears in an instruction operand. Normally used for Special-Purpose registers (SPRs) and bits within file registers.

org

Specifies the start address for following code otherwise the assembler defaults to 000h in the Program store. In this program the subroutine SQR_ROOT is originated at 200h.

Of course symbolic translators demand more computing power than simple hexadecimal loaders, especially in the area of memory and backup store. Prior to the introduction of personal computers in the late 197Os, either mainframe, minicomputers or special-purpose MPU/MCU development systems were required to implement the assembly process. Such implementations were inevitably expensive and inhibited the use of such computer aids, and hand-assembled coding was relatively common. Translation software thus implements two tasks:

• Conversion of the various instruction mnemonics and labels to their machine-code equivalents.topic

• The location of the instructions and data in the appropriate memory location.

It is the second of these that is perhaps more difficult to understand.

Program 8.2 is designed to be processed by an absolute assembler. Here the programmer uses the directive org to tell the assembler to place the code in the specified Program store address. This means that the programmer needs to know where everything is to be placed. This absolute assembly process is shown in Fig. 8.2. Absolute assembly is adequate where However, real projects a program is contained in a single self-contained file; which is the case for the majority of code in this text. often consist of several thousand lines of code and require teamwork. With many modules being written by different people, perhaps also coming in from outside sources and libraries, some means must be found to link the appropriate modules together to give the one executable machine-code file. For example, you may have to call up a division subroutine that Fred has written some time ago. You will not know exactly where in memory this subroutine will reside until the project has been completed. What can you do? Well, a subroutine should have its entry point labelled; say, DIV in this case. You should be able to direct the assembler to give this label the attribute that its absolute value is to be found later by a linker program. We will look at this relocatable way of working later on in the topic.

Most programs running on the low- and mid-range PICs are adequately handled by an absolute assembler. To clarify the process we will take the subroutine of Fig. 8.2 through from the creation of the source file to the final absolute machine-code file.

Editing

Initially the source file must be created using a text editor. A text editor differs from a wordprocessor in that no embedded control codes, giving formatting and other information, are inserted.

Absolute assembly-level code translation.

Fig. 8.2 Absolute assembly-level code translation.

For instance, there is no line wrapping; if you want a new line then you hit the [ENT] key. Most operating systems come with a simple text editor; for example, notepad for Microsoft’s Windows. Third-party products are also available and most word processors have a text mode which can double as a program editor.3 Microchip-compatible assembly-level source files names have an extension .src.

The format of a typical line of source code looks like:

tmp913_thumb

With the exception of comment-only lines, all lines must contain an instruction (either executable by the MCU or a directive) and any relevant operand or operands. Any label must begin in column 1, otherwise the first character must be a space or a tab to indicate no label. A label can be up to 32 alphanumeric, underline or question mark characters with the proviso that the first character be an underline or letter. Labels are usually case sensitive. A line label names the Program store address of the first following executable instruction.

An optional comment is delineated by a semicolon, and whole-line comments are permitted – see lines 11-18 of Program 8.1. Comments are ignored by the assembler and are there solely for human-readable documentation. Notes should be copious and should explain what the program is doing, and not simply repeat the instruction. For example:

tmp914_thumb

is a waste of energy:

tmp915_thumb

is rather more worthwhile. Not, or minimally, commenting source code is a frequent failing, not confined to students. A poorly documented program is difficult to debug and subsequently to alter or extend. The latter is sometimes known as program maintenance.

Space should separate the instruction from any operand. Where there are two operands the source and destination fields are delineated by a comma. In instructions where the destination can be the Working register or the addressed file register, the predefined names w or f should appear in the destination fields or numbers 0 or 1 respectively. The assembler will default to destination file if omitted.

Next post:

Previous post: