Java Reference
In-Depth Information
scanned as one identifier rather than three. But now consider the character
sequence 1..10. In Pascal and Ada, this should be interpreted as a range
specifier (1 to 10). However, if we were careless in our token definitions, we
might well scan 1..10 as two constants, 1. and .10,whichwouldleadtoan
immediate (and unexpected) syntax error. The fact that two constants cannot
be adjacent is reflected in the context-free grammar (CFG), which is enforced
by the parser, not the scanner.
When a formal specification of token and program structure is given, it
is possible to examine a language for design flaws. For example, we could
analyze all pairs of tokens that can be adjacent to each other and determine
whether the two, if catenated, might be incorrectly scanned. If so, a separator
may be required. In the case of adjacent identifiers and reserved words, a
blank space (whitespace) su
ces to distinguish the two tokens. Sometimes,
though, the lexical or program syntax might need to be redesigned. The point
is that language design is far more involved than one might expect, and formal
specifications allow flaws to be discovered before the design is completed.
All scanners, independent of the tokens to be recognized, perform much
the same function. Thus, writing a scanner from scratch means reimplement-
ing components that are common to all scanners; this leads to a significant
duplication of e
ort of
building a scanner to that of specifying which tokens the scanner is to recog-
nize. Using a formal notation, we tell the scanner generator what tokens we
want recognized. It is then the generator's responsibility to produce a scan-
ner that meets our specification. Some generators do not produce an entire
scanner. Rather, they produce tables that can be used with a standard driver
program, and this combination of generated tables and standard driver yields
the desired custom scanner.
Programming a scanner generator is an example of declarative program-
ming . That is, unlike in ordinary, or procedural programming ,wedonottell
a scanner generator how to scan but simply what to scan. This is a higher-level
approach and in many ways a more natural one. Much recent research in
computer science is directed toward declarative programming styles; exam-
ples are database query languages and Prolog, a “logic” programming lan-
guage. Declarative programming is most successful in limited domains, such
as scanning, where the range of implementation decisions that must be made
automatically is limited. Nonetheless, a long-standing (and as yet unrealized)
goal of computer scientists is to generate an entire production-quality compiler
automatically from a specification of the properties of the source language and
target computer.
Although our primary focus in this topic is on producing correct compilers,
performance is sometimes a real concern, especially in widely used “produc-
tion compilers.” Surprisingly, even though scanners perform a simple task,
they can be significant performance bottlenecks if poorly implemented. This
ff
ort. The goal of a scanner generator is to limit the e
ff
 
Search WWH ::




Custom Search