Java Reference
In-Depth Information
Many hand-coded scanners treat reserved words as ordinary identifiers
(as far as matching tokens is concerned) and then use a separate table lookup
to detect them. Automatically generated scanners can also use this approach,
especially if transition table size is an issue. After an apparent identifier is
scanned, an exception table is consulted to see if a reserved word has been
matched. When case is significant in reserved words, the exception lookup re-
quires an exact match. Otherwise, the token should be translated to a standard
form (all uppercase or lowercase) before the lookup.
There are several ways of organizing an exception table. One obvious
mechanism is a sorted list of exceptions suitable for a binary search. A hash
table also may be used. For example, the length of a token may be used as
an index into a list of exceptions of the same length. If exception lengths are
well distributed, then few comparisons will be needed to determine whether
a token is an identifier or a reserved word. Perfect hash functions are also
possible [Spr77, Cic80]. That is, each reserved word is mapped to a unique
position in the exception table and no position in the table is unused. A token is
either the reservedword selected by the hash function or an ordinary identifier.
If identifiers are entered into a string space or given a unique serial number
by the scanner, then reserved words can be entered in advance. Then, when
a string that looks like an identifier is found to have a serial number or string
space position smaller than the initial position assigned to identifiers, we know
that a reserved word rather than an identifier has been scanned. In fact, with
a little care we can assign initial serial numbers so that they exactly match the
token codes used for reserved words. That is, if an identifier is found to have
aserialnumber s ,where s is less than the number of reserved words, then s
must be the correct token code for the reserved word just scanned.
3.7.2 Using Compiler Directives and Listing Source Lines
Compiler directives and pragmas control compiler options (for example, list-
ings, source file inclusion, conditional compilation, optimizations, and profil-
ing). They may be processed either by the scanner or by subsequent compiler
phases. If the directive is a simple flag, then it can be extracted from a token.
The command is then executed, and finally the token is deleted. More elabo-
rate directives, such as Ada pragmas, have nontrivial structure and need to be
parsed and translated like any other statement.
A scanner may have to handle source inclusion directives. These directives
cause the scanner to suspend the reading of the current file and begin the
reading and scanning of the contents of the specified file. Since an included
file may itself contain an include directive, the scanner maintains a stack of
open files. When the file at the top of the stack is completely scanned, it is
popped and scanning resumes with the file now at the top of the stack. When
the entire stack is empty, end-of-file is recognized and scanning is completed.
 
 
Search WWH ::




Custom Search