Scanning—Theory and Practice - Crafting a Compiler

Java Reference

In-Depth Information

scanned as one identifier rather than three. But now consider the character

sequence 1..10. In Pascal and Ada, this should be interpreted as a range

specifier (1 to 10). However, if we were careless in our token definitions, we

might well scan 1..10 as two constants, 1. and .10,whichwouldleadtoan

immediate (and unexpected) syntax error. The fact that two constants cannot

be adjacent is reflected in the context-free grammar (CFG), which is enforced

by the parser, not the scanner.

When a formal specification of token and program structure is given, it

is possible to examine a language for design flaws. For example, we could

analyze all pairs of tokens that can be adjacent to each other and determine

whether the two, if catenated, might be incorrectly scanned. If so, a separator

may be required. In the case of adjacent identifiers and reserved words, a

blank space (whitespace) su

ces to distinguish the two tokens. Sometimes,

though, the lexical or program syntax might need to be redesigned. The point

is that language design is far more involved than one might expect, and formal

specifications allow flaws to be discovered before the design is completed.

All scanners, independent of the tokens to be recognized, perform much

the same function. Thus, writing a scanner from scratch means reimplement-

ing components that are common to all scanners; this leads to a significant

duplication of e

ort of

building a scanner to that of specifying which tokens the scanner is to recog-

nize. Using a formal notation, we tell the scanner generator what tokens we

want recognized. It is then the generator's responsibility to produce a scan-

ner that meets our specification. Some generators do not produce an entire

scanner. Rather, they produce tables that can be used with a standard driver

program, and this combination of generated tables and standard driver yields

the desired custom scanner.

Programming a scanner generator is an example of declarative program-

ming . That is, unlike in ordinary, or procedural programming ,wedonottell

a scanner generator how to scan but simply what to scan. This is a higher-level

approach and in many ways a more natural one. Much recent research in

computer science is directed toward declarative programming styles; exam-

ples are database query languages and Prolog, a “logic” programming lan-

guage. Declarative programming is most successful in limited domains, such

as scanning, where the range of implementation decisions that must be made

automatically is limited. Nonetheless, a long-standing (and as yet unrealized)

goal of computer scientists is to generate an entire production-quality compiler

automatically from a specification of the properties of the source language and

target computer.

Although our primary focus in this topic is on producing correct compilers,

performance is sometimes a real concern, especially in widely used “produc-

tion compilers.” Surprisingly, even though scanners perform a simple task,

they can be significant performance bottlenecks if poorly implemented. This

ff

ort. The goal of a scanner generator is to limit the e

ff

Crafting a Compiler

Search WWH ::

Custom Search

Home