Java Reference
In-Depth Information
Because C has a rather elaboratemacro definition and expansion facility, macro
processing and included files are typically handled by a preprocessing phase
prior to scanning and parsing. The preprocessor,cpp, may in fact be used with
languages other than C to obtain the e
ff
ects of source file inclusion, macro
processing, and so on.
Some languages (such as C and PL
I) include conditional compilation
directives that control whether statements are compiled or ignored. Such
directives are useful in creatingmultiple versions of a programfroma common
source. Usually, these directives have the general form of an if statement;
hence, a conditional expression will be evaluated. Characters following the
expression will either be scanned and passed to the parser, or ignored until
an end if delimiter is reached. If conditional compilation structures can be
nested, a skeletal parser for the directives may be needed.
Another function of the scanner is to list source lines and to prepare for the
possible generation of error messages. While straightforward, this requires a
bit of care. The most obvious way to produce a source listing is to echo
characters as they are read, using end-of-line characters to terminate a line,
increment line counters, and so on. However, this approach has a number of
shortcomings:
/
Error messages may need to be printed. These should appear merged
with source lines, with pointers to the o
ff
ending symbol.
A source line may need to be edited before it is written. This may involve
inserting or deleting symbols (for example, for error repair), replacing
symbols (because of macro preprocessing), and reformatting symbols
(to prettyprint a program, that is, to print a program with text properly
indented, if-elsepairs aligned, and so on).
Source lines that are read are not always in a one-to-one correspondence
with source listing lines that are written. For example, in Unix a source
program can legally be condensed into a single line (Unix places no limit
on line lengths). A scanner that attempts to bu
ff
er entire source lines
may well overflow bu
ff
er lengths.
In light of these considerations, it is best to build output lines (which
normally are bounded by device limits) incrementally as tokens are scanned.
The token image placed in the output bu
er may not be an exact image of
the token that was scanned, depending on error repair, prettyprinting, case
conversion, or whatever else is required. If a token cannot fit on an output
line, then the line is written and the bu
ff
er is cleared. (To simplify editing, you
should place source line numbers in the program's listing.) In rare cases, a
token may need to be broken; for example, if a string is so long that its text
exceeds the output line length.
ff
 
Search WWH ::




Custom Search