Java Reference
In-Depth Information
3
Scanning—Theory and
Practice
In this chapter, we discuss the theoretical and practical issues involved in
building a scanner . For the purposes of crafting a compiler, the scanner's job
(as introduced in Section 2.4 on page 38) is to translate an input stream of
characters into a stream of tokens , each corresponding to a terminal symbol of
a programming language. More generally, scanners perform specified actions
triggered by an associated pattern of input characters. Techniques related to
scanning are found in most software components that are tasked with identi-
fying structure in their input. For example, the processing of network packets,
the display of Web pages, and the interpretation of digital video and audio
media require some form scanning.
In Section 3.1, we give an overview of how a scanner operates. Section 3.2
revisits the declarative regular expression notation introduced in Section 2.2 on
page 33, which is particularly well suited to the formal definition of tokens
and the automatic generation of scanners. In Section 3.4, the correspondence
between regular expressions and finite automata is studied. Section 3.5 con-
siders a widely used scanner generator, Lex , as a case study. Lex uses regular
expressions to produce a complete scanner component, ready to be compiled
and deployed on its own or as part of a larger project. Section 3.6 briefly
considers other scanner generators.
In Section 3.7, we discuss the practical considerations needed to build
a scanner and integrate it with the rest of a compiler. These considerations
57
 
 
Search WWH ::




Custom Search