Java Reference
In-Depth Information
is described in more detail below. Chapter 2 presents a simple compiler to pro-
vide concrete examples of many of the concepts introduced in this overview.
1.5.1 The Scanner
The scanner begins the analysis of the source program by reading the input
text (character by character) and grouping individual characters into tokens
such as identifiers, integers, reserved words, and delimiters. This is the first
of several steps that produce successively higher-level representations of the
input. The tokens are encoded (often as integers) and fed to the parser for
syntactic analysis. When necessary, the actual character string comprising the
token is also passed along for use by the semantic phases. The scanner does
the following:
It puts the program into a compact and uniform format (a stream of
tokens).
It eliminates unneeded information (such as comments).
It processes compiler control directives (for example, turn the listing on
or o
ff
and include source text from a specified file).
It sometimes enters preliminary information into symbol tables (for ex-
ample, to register the presence of a particular label or identifier).
It optionally formats and lists the source program.
The main action of building tokens is often driven by token descriptions.
Regular expression notation (discussed in Chapter 3) is an e
ective approach
to describing tokens. Regular expressions are a formal notation su
ff
ciently
powerful to describe the variety of tokens required by modern programming
languages. In addition, they can be used as a specification for the automatic
generation of finite automata (discussed in Chapter 3) that recognize regular
sets , that is, the sets that regular expressions define. Recognition of regular sets
is the basis of the scanner generator . A scanner generator is a program that
actually produces a working scanner when given only a specification of the
tokens it is to recognize. Scanner generators are a valuable compiler-building
tool.
1.5.2 The Parser
The parser is based on a formal syntax specification such as a CFGs. It reads
tokens and groups them into phrases according to the syntax specification.
Grammars are discussed in Chapters 2 and 4, and parsing is discussed in
 
 
 
Search WWH ::




Custom Search